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Abstract 



In this article, we analyze the morphometry (shape and size) of hippocampus in subjects with very mild 
dementia of Alzheimer's type (DAT) and nondemented controls and how the morphometry changes over 
a two-year period. Morphometric differences with respect to a template hippocampus were measured 
by the metric distance obtained from the Large Deformation Diffeomorphic Metric Mapping (LDDMM) 
algorithm which was previously used to calculate dense one-to-one correspondence vector fields between 
the shapes. LDDMM assigns metric distances on the space of anatomical images thereby allowing for 
the direct comparison and quantization of morphometric changes. We characterize what additional in- 
formation the metric distances provide in terms of size and shape given the volume measurements of the 
hippocampi. Moreover, we demonstrate how metric distances can be used in cross-sectional, longitudinal, 
and left-right asymmetry comparisons. We perform a principal component analysis on metric distances 
and hippocampus, brain, and intracranial volumes. We use repeated measures ANOVA models to test 
the main effects of and interaction between the diagnosis, duration, and hemisphere factors to see which 
factors significantly explain the differences in metric distances. When a factor is found to be significant, 
we use classical parametric and non-parametric tests to compare the metric distances for that factor. The 
analysis of metric distances is then used to compare the effects of aging in the hippocampus. At base- 
line, the metric distances for demented subjects are found not to be significantly different from those for 
nondemented subjects. At follow-up, the metric distances for demented subjects were significantly larger 
compared to nondemented subjects. The metric distances for demented subjects increased significantly 
from baseline to follow-up but not for nondemented subjects. We also demonstrate that metric distances 
can be used in a logistic regression model for diagnostic discrimination of subjects. We compare metric 
distances with the volumes and obtain similar results in cross-sectional and longitudinal comparisons. In 
classification, the model that uses volume, metric distance, and volume loss over time together performs 
better in detecting DAT. Thus, metric distances with respect to a template computed via LDDMM can 
be a powerful tool in detecting differences in shape in cross-sectional as well as longitudinal studies. 



1 Introduction 

Numerous post-mortem studies have shown that neurofibrillary tangles and amyloid plaques characteristic 
of Alzheimer's Disease (AD) are prominent within the hippocampus of individuals with mild dementia of 
the Alzheimer's type (DAT) and that the distribution of these neuropathological markers becomes more 
widespread to include several regions of the neocortex as the disease process progresses [1-7] . The accumulation 
of neurofibrillary tangles and amyloid plaques characteristic of AD are associated with neuronal damage and 
death [8]. Furthermore, macroscopic gray matter losses from the accumulation of microscopic scale neuronal 
destruction are detectable in living subjects using currently available magnetic resonance (MR) imaging. 
Specifically, volume losses within the hippocampus [9-14] have recently been reported in subjects with mild-to- 
modcratc AD. In an unusual study where the antemortem MR scans and post-mortem material was available 
for the same subjects, hippocampal volume losses were shown to be powerful antemortem predictors of AD 
neuropathology [15]. Progressive atrophy of the entire brain has been observed in AD cases [16]. However, 
due to the complexity of the human brain and the non-uniform distribution of AD neuropathology early in 
the course of disease, detailed examination of specific brain regions known to be affected early in the AD 
disease process (e.g., hippocampus) may be preferred for distinguishing preclinical and very mild forms of AD 
from normal aging [17-19]. 

Methods developed in the field of Computational Anatomy (CA) that enable quantification of brain struc- 
ture volumes and shapes between and within groups of individuals with and without various neurological 
diseases have emerged from several groups in recent years [20-25]. Based on the mathematical principles of 
general pattern theory [18, 19, 23, 26, 27], these methods combine image-based diffeomorphic maps between 
MR scans with representations of brain structures as smooth manifolds. Because of their high repeatability 
and sensitivity to changes in neuroanatomical shapes, they can be especially sensitive to abnormalities of 
brain structures associated with early forms of AD. Using such methods, we previously demonstrated that 
the combined assessment of hippocampal volume loss and shape deformity optimally distinguished subjects 
with very mild DAT from both elder nondemented subjects and younger healthy subjects [10]. These methods 
also allowed us to demonstrate that hippocampal shape deformities associated with very mild DAT and non- 
demented aging were distinct [28] . These methods were also extended to quantify changes in neuroanatomical 
volumes and shapes within the same individuals over time [29]. Other longitudinal neuroimaging analysis of 
hippocampal structures in individuals with AD have also emerged [30-41]. 



An important task in CA is the study of neuroanatomical variability. The anatomic model is a quadruple 
(fl,Q,T,V) consisting of f2 the template coordinate space (in R 3 ), defined as the union of 0, 1, 2, and 3- 
dimensional manifolds, Q : fl <-> il a set of diffcomorphic transformations on f2, I the space of anatomies is 
the orbit of a template anatomy Iq under Q, and V the family of probability measures on Q. In this framework, 
a geodesic (j> : [0, 1] — -> Q is computed where each point <j> t € Q,t € [0,1] is a diffcomorphism of the domain il. 
The evolution of the template image Jo along path is given by (j>tIo = Jo ° 07 1 such that the end point of the 
geodesic connects the template Jo to the target I\ via I\ = 4>\Iq = Iq o (f)^ 1 . Thus, anatomical variability in 
the target is encoded by these geodesic transformations when a template is fixed. 

Furthermore, geodesic curves induce metric distances between the template and the target shapes in the 
orbit as follows. The diffcomorphisms are constructed as a flow of ordinary differential equations <j>t = Vt{(f>t), 
t £ [0,1] with 0o = id the identity map, and associated vector fields, Vt, t £ [0,1]. The optimal velocity 
vector field parameterizing the geodesic path is found by solving 

v= arg inf J ||i>t||y dt such that Jo ° 07 1 = Ji, (1) 

v '■ <t> = Jo v ti4>t)dt, 
0o = id 

where vt £ V, the Hilbert space of smooth vector fields with norm \\-\\ v defined through a differential operator 
enforcing smoothness. The length of the minimal length path through the space of transformations connecting 
the given anatomical configurations in Jo and I± defines a metric distance between anatomical shapes in Jo 
and Ji via 

d(I ,h)= f \\Vt\\ydt, (2) 

Jo 

where v t is the optimizer calculated from the Large Deformation Diffeomorphic Metric Mapping (LDDMM) 
algorithm [42]. Here, the metric distance does not have any units. The construction of such a metric space 
allows one to quantify similarities and differences between anatomical shapes in the orbit. This is the vision 
laid out by D Arcy W. Thompson almost one hundred years ago. Figure Q] exemplifies the change in the 
metric distance during the evolution of the diffeomorphic map from the template shape to the target shape. 

The notion of mathematical biomarker in the form of metric distance can be used in different ways. 
One is to generate metric distances of shapes relative to a template [42, 43]. Another is to generate metric 
distances between each shape within a collection [44]. The latter approach allows for sophisticated pattern 
classification analysis; it is however computationally expensive. We present an analysis based on the former 
approach which could provide a powerful tool in analyzing subtle shape changes over time with considerably 
less computational load. This approach may allow detecting the subtle morphometric changes observed in the 
hippocampus in DAT subjects in particular for those previously analyzed [29, 45]. These studies compared 
rates of change in hippocampal volume and shape in subjects with very mild DAT and matched (for age 
and gender) nondemented subjects. The change in hippocampal shape over time was defined as a residual 
vector field resulting from rigid-body motion registration, and changes in patterns along hippocampal surfaces 
were visualized and analyzed via a statistical measure of individual and group change in hippocampal shape 
over time and used to distinguish between the subject groups. Hence the motivation to analyze LDDMM 
generated metric distances between binary hippocampus images at baseline and at follow-up with respect 
to the same template hippocampus image. That is, the template was compared again, and not propagated 
between time points. One might wonder why we do not track changes within a subject directly, rather than 
via a reference template, as it could give a more sensitive measure of shape change since the small difference 
in shape would make finding correspondence more accurate. Although we have considered doing this, the 
difficulty is that since the template (or origin) is different for each longitudinal computation, how to correctly 
perform statistical comparison of group change is not completely settled. This is actively being developed by 
using the concept of "parallel transport" [46, 47]. In this study, we compute and analyze metric distances 
based on the data used in [29] . 

We briefly describe the data set in Section 2.1, computation of metric distances via LDDMM in Section 2.2, 
statistical methods we employ in Section 2.3, and results and findings in Section 3, which include descriptive 
summary statistics of the metric distances, comparison of metric distances of hippocampi of non-demented 
subjects and subjects with very mild dementia, correlation between metric distances, comparison of distri- 
butions of metric distances, and discriminative power of metric distances. We perform similar analysis on 
hippocampal volumes in Section 4, compare volumes and LDDMM distances in Section 5, and analyze annual 



percentage rate of change in volumes and distances in Section 6. In the final section, we discuss the use of 
metric distances for baseline-followup studies, group comparisons, and discrimination analysis. 

2 Methods 

2.1 Subjects and Data Acquisition 

Detailed description of subjects can be found in [29] where 18 very mild DAT subjects (Clinical Dementia 
Rating Scale, CDR0.5) and 26 age-matched nondemcnted controls (CDRO) were each scanned approximately 
two years apart. Clinical Dementia Rating (CDR) Scale assessments which detect the severity of dementia 
symptoms were performed annually in all subjects by experienced clinicians without reference to neuropsy- 
chological tests or in- vivo neuroimaging data. The experienced clinician conducted semi-structured interviews 
with an informant and the subject to assess the subject's cognitive and functional performance; a neurologi- 
cal examination was also obtained. The clinician determined the presence or absence of dementia and, when 
present, its severity with the CDR. Overall CDR scores of indicate no dementia, while CDR scores of 0.5, 
1, 2, and 3 indicate very mild, mild, moderate and severe dementia, respectively [48]. CDR assessments have 
been shown to have an inter-rater reliability of k = 0.74 (weighted kappa coefficient [49] n of 0.87) [50], and 
this high degree of inter-rater reliability has been confirmed in multi-center dementia studies [51]. Elderly 
subjects with no clinical evidence of dementia (i.e., CDRO subject) have been confirmed with normal brains 
at autopsy with 80% accuracy; i.e., approximately 20% of such individuals show evidence of AD [52]. CDRO. 5 
subjects have subtle cognitive impairment, and 93% of them progress to more severe stages of illness (i.e., 
CDR > 0.5) and show neuropathological signs of AD at autopsy ([53], [54], and [52]). Although elsewhere 
the CDRO. 5 individuals in our sample may be considered to have MCI [55], they fulfill our diagnostic criteria 
for very mild DAT and at autopsy overwhelmingly have neuropathologic AD [56]. A summary of subject 
information is listed in Table [T] 

The scans were obtained using a Magnetom SP-4000 1.5 Tcsla imaging system, a standard head coil, and a 
magnetization prepared rapid gradient echo (MPRAGE) sequence. The MPRAGE sequence (TR/TE - 10/4, 
ACQ - 1, Matrix - 256 x 256, Scanning time - 11.0 min) produced 3D data with a 1mm x 1mm in-planc 
resolution and 1 mm slice thickness across the entire cranium. 

A neuroanatomical template was produced using an MR image from an additional elder control (i.e., 
CDRO or non-demented) subject (male, age = 69). The choice and a detailed description of the template is 
provided in [57]. The subject selected to produce this template was obtained from the same source as the 
other subjects in the study, but was not otherwise included in the data analysis. Data used are the left and 
right hippocampal surfaces in the template scan created from expert-produced manual outlines using methods 
previously described [28,58], and the left and right hippocampal surfaces of each subject generated at baseline 
and follow-up. These surfaces were converted to binary hippocampus volumetric images by flood filling the 
inside of the surface and giving it label 1, and the outside of the surface was labeled as 0, or background. Each 
individual hippocampal surface was first scaled by a factor of 2 and aligned with the template surface, which 
was also scaled by a factor of 2, via a rigid-body rotation and translation before converting to volumetric 
binary images. In [58] we showed that mapping accuracy could be enhanced at higher resolution because of 
smaller voxels - voxels at the periphery of the structure (i.e., surface) account for much more of the structural 
volume at 1 mm 3 voxel resolution versus 0.5 mm 3 . Since then we have adapted this as part of the standard 
mapping procedure. These surfaces were then converted into binarized image of dimension 64 x 112 x 64 
with voxel resolutions of 0.5 x 0.5 x 0.5 mm 3 , followed by smoothing by a Gaussian filter of 9 x 9 x 9-voxel 
window and one voxel standard deviation to smooth out the edges for LDDMM, which was then applied to 
each template-subject pair to compute metric distances, d\, dl{k = 1, . . . , 44), in each hemisphere at baseline 
(b) and at follow-up (/) as illustrated in Figure [2] 

Controlling for brain size is important because people with bigger brains tend to have bigger hippocampus 
and we want our results to not reflect that very uninteresting fact; we inherently correct for brain size by first 
rigid-aligning the subject brain to the prototype brain prior to LDDMM. Segmentation of hippocampal MRI 
shapes across subjects, especially in diseased states, is a challenging problem. However the accuracy of the 
segmentation is not the point of this paper and has been demonstrated before [10, 57, 58]. 



In addition to the metric distances, our data set also consists of the following variables: gender, age, educa- 
tion in years (these variables are used for controlling the confounding affects of these factors on hippocampus 
morphometry, so the subjects are taken to be similar or evenly distributed in these variables). Furthermore, 
we have brain and intracranial volumes at baseline and followup, and hippocampus volumes for left and right 
hippocampi at baseline and follow-up. 



2.2 Computing Metric Distance via Large Deformation Diffeomorphic Metric 
Mapping 

Metric distances between the binary images and the template image are obtained by computing diffeomor- 
phisms between the images. Computation and analysis of these diffeomorphic mappings have been previously 
described [57]. Diffeomorphisms are estimated via the variational problem that, in the space of smooth 
velocity vector fields V on domain fl, takes the form [42]: 

v= argmin ( / \\v t \\ydt + ||/ o - h ||* 2 ) . (3) 

The optimizer of this cost generates the optimal change of coordinates ip = (f>1 upon integration d tfi t / dt = 
v t (^f> t , (f>o = id, where the subscript v in cf> v is used to explicitly denote the dependence of cf> on the associated 
velocity field v. Enforcing a sufficient amount of smoothness on the elements of the space V of allowable 
velocity vector fields ensures that the solution to the differential equation <p t — v t (4>t), t € [0, 1], Vt G V is 
in the space of diffeomorphisms [59, 60]. The required smoothness is enforced by defining the norm on the 
space V of smooth velocity vector fields through a differential operator L of the type L = (— aA + j) a I nxn 
where a > 1.5 in 3-dimensional space such that ||/||y = |jL/|| i2 and \\-\\ L is the standard Li norm for square 
integrable functions defined on f2. The gradient of this cost is given by 



V v E t = 2v t -K\ — 



VJ°(J?-J})) (4) 



where J t ° = Iq o (j> t and J\ = I\ o (jyj 1 , \Dg\ is the determinant of the Jacobian matrix for g and K is a 
compact self-adjoint operator K : L 2 (^,R- d ) — * V uniquely defined by < a, b >l 2 =< Ka,b >y such that for 
any smooth vector field / £ V, K{lJ L)f = / holds. The metric distance is then calculated via Equation (2). 



2.3 Statistical Methods 

First, we investigate what LDDMM metric distance measures and how it is related to hippocampus, brain, 
and intracranial volumes. That is, as a compound measure of morphometry, how much of the metric distance 
is related to shape and size and how it is associated with the volume, which is mostly a measure of size. Along 
this line, we provide the correlation between volume and metric distance measures by the pairs plots at baseline 
and follow-up of left and right hippocampi. Furthermore, we perform a principal component analysis (PCA) 
on metric distance and volumes to characterize the major traits these quantities measure. Then we provide 
a statistical methodology for the analysis of LDDMM distances. We compute and interpret simple summary 
statistics, such as, mean, standard deviation (SD), minimum, first quartile (Qi), median, third quartile (Q3), 

and maximum for d^ b '^ . Then we apply repeated measures analysis of metric distances with diagnosis group 
as main effect and timcpoint as the repeated factor, side (i.e., hemisphere) as main effect and timepoint as 
the repeated factor, and diagnosis group as main effect and side-by-timepoint as the repeated factor, since 
there are within-subject dependence of metric distances for left and right hemispheres and at baseline and 
follow-up. We apply four possible competing models each assuming a different variance-covariance structure 
to obtain the model that best fit to our data set. The first model assumes compound symmetry, in which the 
diagonals (i.e., the variances) are equal, and so are the off diagonals (i.e., the covariances). The other three 
models assume unstructured, autoregressive (AR), and autoregressive heterogeneous variances, respectively. 
In the unstructured model, each variance and covariance term is different, in the AR model, the variances 
are assumed to be equal but the covariances change by time, and in the ARH model, the variances are also 
different and the covariances change by time. The corresponding variance-covariance (Var-Cov) structures 



[61-63] for the models are shown in Tabled where a 2 is the common variance term, of is the variance for 
repeated factor i, <jij is the covariance between repeated factors i and j, and p is the correlation coefficient of 
first order in an autorcgrcssivc model. We use various model selection criteria (Akaikc Information Criterion 
(AIC), Bayesian Information Criterion (BIC), Log-likelihood) to compare competing models to see which 
model best fits our data [64] . 

For post-hoc comparison of CDR0.5 vs CDRO, our null hypothesis for the comparison of diagnosis groups, 
CDRO and CDRO. 5, is H Q : pcdro — ^cdro.5 f° r ca ch baseline left, baseline right, and follow-up left, follow- 
up right hippocampi. For the i-test, among the underlying assumptions are the normality of the distributions 
and homogeneity of the variances of the independent samples. We employ Lilliefor's test of normality [65]; 
and Brown and Forsythe's (B-F) test (i.e., Levene's test with absolute deviations from the median) for 
homogeneity of the variances [66] . If there is lack of significant deviation from normality of distribution of a 
metric distances for a group, we will state it as "the metric distances for the group can be assumed to come 
from a normally distributed population" , henceforth. 

We compare metric distances at baseline and follow-up. The LB-CDR0.5 metric distances and LF-CDR0.5 
metric distances are dependent as they come from the same person at baseline and follow-up. Likewise for 
LB-CDRO and LF-CDRO pairs. Hence our null hypothesis for the comparison of baseline and follow-up groups 
is H a : 5(B,F) = where 5(B,F) is the mean difference of metric distances between hippocampi at baseline 
and follow-up for each of CDRO left, CDRO right, CDRO. 5 left, CDRO. 5 right hippocampi. We also compare 
the metric distances for the left and right hippocampi are dependent (for each left-right hippocampi pair 
comes from the same subject). Hence our null hypothesis for the comparison of left and right metric distances 
is H : S(L,R) = where 5{L,R) is the mean difference of metric distances between left and right metric 
distances for each of CDRO baseline, CDRO follow-up, CDRO. 5 baseline, CDRO. 5 follow-up hippocampi. 

We also calculate and interpret correlation coefficients between metric distances. Since metric distances of 
all groups can be assumed to have normal distribution based on Lilliefor's test of normality, we use Pearson's 
correlation coefficient, denoted rp, between baseline and follow-up (overall and by diagnosis group) and for 
the left and right hippocampi and the corresponding tests of H : rp = vs H a : rp > for inference 
[67,68]. 

We also estimate the empirical cumulative distribution functions (cdf ) of the metric distances and compare 
them by Kolmogorov-Smirnov (K-S) test, Cramer's test, and Cramer-von Mises test. The null hypothesis for 
the comparison of cdfs of the metric distances per diagnosis groups, CDRO and CDRO. 5, is H a : F CDR0 = 
Fcdro.5 f° r each baseline left, baseline right, follow-up left, follow-up right hippocampi. For calculation of 
the critical value of Cramer's test the kernel 4>c(x) = (which is recommended for location alternatives) is 
used. The estimated p- values are based on a = 0.05 and 10000 ordinary bootstrap replicates. 

We apply logistic discrimination with metric distances and other variables, since the diagnosis have only 
two levels, namely CDRO and CDRO. 5. We use logistic regression to estimate or predict the risk or probability 
of having DAT using metric distances, together with side (i.e., hemisphere) and timepoint (baseline vs follow- 
up) factors. In other words, we model the probability that the subject is CDRO. 5 given the metric distance 
of the subject for left or right hippocampus at baseline or follow-up. In standard logistic regression the 
model-parameters are obtained via maximum likelihood estimators. For more on logistic regression and 
logistic discrimination, see [69] and [70], respectively. We consider the logistic model with the response 
where (i.e., the probability that the subject is diagnosed with CDRO. 5). First we model with one predictor 
variable at a time from side, timepoint, and metric distance, etc., if the variable is not significant at .05 
level, we omit that variable from further consideration. We consider the full logistic model with the response 
logit p = log [p/(l — p)] where p = P(Y = 1) (i.e., the probability that the subject is diagnosed with CDRO. 5); 
the remaining variables with all possible interactions as the predictor variables. On this full model, we choose 
a reduced model by AIC in a stepwise algorithm, and then use stepwise backward elimination procedure on 
the resulting model [64] . We stop the elimination procedure when all the remaining variables arc significant 
at a = 0.05 level. 

Based on the final model with significant predictors, we apply logistic discrimination. In logistic discrimi- 
nation the log-odds-ratio of the conditional classification and therefore indirectly the conditional probabilities 
of being CDRO. 5 and CDRO are modeled. In general, if this estimated probability is larger than a prespecified 
probability p a , the subject is classified as CDRO. 5, otherwise the subject is classified as CDRO (i.e., normal). 



This means our decision function reduces to 



, x f > Po classify CDRO. 5, 
p k = P(Y=l\d ijk )l [ / po ^ das J yCDR0 ^ (5) 

where p is usually taken to be 0.5. This threshold probability p a can also be optimized with respect to a 
cost function which incorporates correct classification rates, sensitivity, and/or specificity [71]. 

We apply the same analysis procedure on hippocampal volumes to compare the results with LDDMM 
metric distances. Furthermore, we find the differential volume loss and metric distance change by using the 
annual percentage rate of change (APC) in volume and metric distance (see [71] for APC in volume for 
cntorhinal cortex). We also consider the logistic discrimination models that incorporate volume and metric 
distance together and APC in volumes and metric distances together. 



3 Analysis of LDDMM Distances of Hippocampi 

3.1 Preliminary Analysis of LDDMM Distances and Other Variables 

The summary measures for the variables are provided in Tabic [TJ Observe that the subjects are evenly 
distributed in terms of gender, years of education, scan intervals, and age for the diagnostic groups. The 
brain and intracranial volumes are much larger in scale, then come the hippocampal volumes, and then the 
metric distances. Notice that brain and hippocampal volumes all decrease by time and are smaller in CDRO. 5 
subjects compared to CDRO subjects. On the other hand, the metric distances tend to increase by time and 
are larger for the CDRO. 5 subjects. Also presented in Table[T]are the p-values for Lilliefor's test of normality 
and Wilcoxon rank sum test for differences between the diagnostic groups. Notice that most variables can 
be assumed to follow a Gaussian distribution, but since a few fails to do so, we apply the Wilcoxon rank 
sum test instead of Welch's t-test. The diagnostic groups do not significantly differ in age, education, brain 
and intracranial volumes. Furthermore, among the metric distances, we see that only right follow-up metric 
distances are significantly different between the diagnostic groups. 

We present the pairs plot (scatter plot of each pair) of continuous variables in Figure [3] and also calculate 
the correlation coefficients between each pair of the variables (not presented). We observe that age and 
education are not significantly correlated with any of the other variables. Hence we discard them in our 
prospective analysis (except for logistic discrimination) . We observe significant correlation between each pair 
of hippocampal volumes, and between each pair of brain and intracranial volumes. The metric distances 
are only moderately correlated with each other. Hippocampal volumes are mildly correlated with brain and 
intracranial volumes. The same holds for the metric distances but to a lesser extent. 

Summary statistics of population mean, standard deviation (SD), minimum, first quartile (Qi), median, 
third quartile (Q3), and maximum for dj, b are presented in Table [TJ 

Baseline metric distances seem to be different in distribution (location and spread) from follow-up metric 
distances follow-up distances being larger than baseline distances for both left and right hippocampi; likewise 
left metric distances seem to be different from right metric distances with right distances being larger than 
left for both baseline and follow-up. Let LDB be the metric distances for left hippocampi at baseline, LDF 
be the metric distances for left hippocampi at follow-up. Let RDB and RDF be similarly defined for right 
hippocampi. One-tailed i-tests revealed that the order of these measures is LDB < LDF < RDB < RDF with 
all inequalities being significant at .05 level. This implies that the morphometric differences of left hippocampi 
with respect to the left hippocampus of the template subject at baseline are significantly smaller than those 
at follow-up, i.e., at baseline, left hippocampi are more similar to the left template hippocampus, and by 
follow-up left hippocampi tend to become more different in morphometry (shape and size) from the template 
hippocampus. This is not surprising, as the template hippocampus is one from the baseline hippocampi. That 
is the template was based on a baseline scan. Although this should seem to be irrelevant in view of the wide 
age variation, it is not the age that is the main point here, when baseline and follow-up are compared, we use 
matched pair (i.e., dependent) tests, which would reveal differences that would otherwise be concealed by the 
independent two-sample tests. For example, when all the subjects age about two years, their morphometric 
alterations accumulate to render their relative difference from the template more significant. 



The right hippocampi reveal similar morphometric differences and change over time. Furthermore, we 
observe that the morphometric difference of right hippocampi from the right template hippocampus is sig- 
nificantly larger compared to the morphometric difference of left hippocampi from the left template at both 
baseline and follow-up. The summary statistics (means and standard deviations (SD)) for left and right 
metric distances by group are provided in Table [1] 

Observe that CDRO distances are smaller than CDR0.5 distances at baseline and at follow-up for both left 
and right hippocampi. This suggests that the morphometric differences of CDRO hippocampi with respect to 
the template hippocampus are smaller than those of CDRO. 5 hippocampi. This is not surprising, considering 
the template hippocampus being one of the CDRO hippocampi. Furthermore, the standard deviations of the 
distances for CDRO subjects tend to be smaller than those of CDRO. 5 subjects. That is, the morphometric 
variability of CDRO hippocampi with respect to the template hippocampus is smaller than that of CDRO. 5 
hippocampi. The statistical significance of these results will be provided in the following sections. 

See also Figure [4] for the (jittered) scatter plots of the metric distances by group, where the crosses are 
centered at the mean distances and the points are jittered (scattered) along the horizontal axis in order to avoid 
frequent point concurrence and tight clustering of points, thereby making the plot better for visualization. 

3.2 Principal Component Analysis for the Volumes and Metric Distances 

The volumes and metric distances measure different but related aspects of morphometry, so some of the 
variables are highly correlated with each other (see Figure [3J . We perform principal component analysis 
(PCA) to obtain a set of uncorrected variables that hopefully represent some identifiable aspect of the 
morphometry. See [70,76] for more on PCA. When we perform PCA of metric distances and volumes of left 
hippocampi at baseline with eigenvalues based on the covariancc matrix, we observe that the first principal 
component (PC) accounts for almost all the variation (see Tabic [2]). Considering the variable loadings in 
Tabled we see that PCI is the head size component, PC2 is the contrast between brain and intracranial 
volumes, PC3 is the hippocampus size, and PC4 is the metric distance component. However, the volumes 
are in mm 3 and metric distances are unitless, hence the data are not to scale. In particular, the brain and 
intracranial volumes have the largest variation in the data set, hence dominate the PCs. To remove the 
influence of the scale (or unit), we apply PCA with eigenvalues based on the correlation matrix (i.e., PCA on 
the standardized variables). The importance scores of principal components and variable loadings from the 
PCA of metric distances and volumes of left hippocampi at baseline and followup with eigenvalues based on 
the correlation matrix are presented in Table 3. Notice that with the correlation matrix, the first three PCs 
account for almost all the variation in the variables. Comparing the variable loadings, PCI seems to be the 
head size component, PC2 is the hippocampus shape, PC3 is the hippocampus size and the contrast between 
hippocampus and head size, and PC4 is the contrast between brain and intracranial volume. The PCA on 
variables for right hippocampi yields similar results (see Table 4). 

The variable loadings in the PCA of variables at baseline and follow-up suggest that brain and intracra- 
nial volumes are mostly measures of head size, metric distance is mostly a measure of hippocampus shape 
and partly is a measure of head and hippocampus size, and hippocampus volume is mostly a measure of 
hippocampus size and partly related to hippocampus shape and head size. That is, volumes and the metric 
distance convey information that is related but not identical. Volumes mostly emphasize the size differences, 
while metric distances emphasize the shape differences. Hence, one should use both of them in morphometric 
analysis of brain tissues. 

3.3 Repeated Measures Analysis of LDDMM Distances 

Due to within-subject dependence of metric distances for left and right hemispheres and for baseline and 
follow-up measures, we apply repeated-measures analysis with group and side as main effects and timcpoint 
as the repeated factor, and group as main effect and side-by-timepoint as the repeated factor (see below). 
For the left data, metric distances at baseline for CDRO. 5 subjects are labeled as LB-CDR0.5, at follow-up 
are labeled as LF-CDR0.5. CDRO individuals are labeled as LB-CDRO and LF-CDRO accordingly. Similar 
labeling is done for the right metric distances. Hence, we have four measurements for each subject, so repeated 
measures analysis can be performed on our data set. 



3.3.1 Modeling LDDMM Distances with Group as Main Effect with Compound Symmetry in 
Var-Cov Structure 



For the repeated measures ANOVA with group as main effect and compound symmetry repeated over time, 
for each subject, we will denote diagnosis, timepoint, and hemisphere factors as numerical subscripts for 
convenience. The corresponding model is 

dijk = yu + a t + otj + a i3 + s tj k (6) 

where dijk is the distance for subject k with diagnosis i [i = 1 for CDRO; 2 for CDR0.5) at timepoint j 
(j = 1 for baseline; 2 for follow-up), \x is the overall mean, af is the effect of diagnosis level i, aj is the 
effect of timepoint level j, a® T is the diagnosis-by-timepoint interaction, i.e., part of the mean distance 
not attributable to the additive effect of diagnosis and timepoint, and Sij k is the error term. The Var-Cov 
structure for the error term is 

Vax(£ijk) = cr 2 and Cow(e ijkl £ ljlk ) = o[ . 

Notice that the effect of side (left or right) is ignored in this model. There is no significant group main effect 
[F = 3.36, df = 1,42, p = 0.0739). However, the within group time-point main effect (F = 11.16, df = 
1,130, p = 0.0011) and the group-by-timepoint interaction (F = 4.84, df = 1,130, p = 0.0295) are both 
significant, which imply that the two groups should be compared at the different time points. In Figure we 
present the interaction plots for diagnosis over time, where the end points of the line segments are located at 
the mean metric distances at baseline and follow-up years. We see that both lines increase over time, but are 
not parallel; the increase of the line for CDRO. 5 group is steeper. 

3.3.2 Modeling LDDMM Distances with Side as Main Effect with Compound Symmetry in 
Var-Cov Structure 

For the repeated measures ANOVA with side as main effect and compound symmetry repeated over time, 
the corresponding model is 

d^k = m + af + a J + a ij T + £yfe (7) 

where dijk is the distance for subject k for side i (i = 1 for left; 2 for right) at timepoint j (j = 1 for baseline; 
2 for follow-up), [i is the overall mean, af is the effect of side level i, aj is the effect of timepoint level j, af? 
is the side-by-timepoint interaction, and e^k is the error term. The Var-Cov structure for the error term is 

Var(e ijfc ) = a 1 and Cov(e ijk ,e. lj/k ) = erf. 

Notice that the effect of diagnosis (CDRO or CDRO. 5) is ignored in this model. The side and timepoint main 
effects are both significant (F = 20.25, df = 1, 129,p < O.OOOlandF = 12.51, df = 1,129, p = 0.0006, respec- 
tively), but side-by-timepoint interaction is not significant (F = 1.85, df = 1,129, p = 0.1766). Consequently, 
we see that the lines are parallel but far apart, the main effect of side comparison is meaningful and about 
the same at each timepoint. Moreover, the sides do change in morphometry over time. In Figure [5l we see 
that both lines increase over time and arc parallel, but the slope for right side seems to be steeper, which will 
eventually make the slope estimates significantly different. 

3.3.3 Modeling LDDMM Distances with Group, Side, and Group-by-Side Interaction 

Looking at models including only the main effects of side or group separately does not answer all our questions. 
We would also like to know, for example, if the metric distances of left hippocampi of CDRO. 5 subjects are 
different from those of left CDRO subjects. In order to address these types of questions we need to look at a 
model that includes the interaction of diagnosis and side. First, we need to model the Var-Cov structure for 
the repeated measures for each subject. We have four correlated measures per subject, namely LDB, LDF, 
RDB, and RDF. Below is the estimated Var-Cov matrix for these variables: 

" 0.46 0.35 0.18 0.02 " 
0.35 0.60 0.24 0.18 
0.18 0.24 0.45 0.23 
0.02 0.18 0.23 0.44 



We start with compound symmetry for our model, and then try unstructured, autoregressive (AR), and au- 
toregressive heterogeneous (ARH) Var-Cov structures. The variances (in the diagonal) suggest heterogeneity 
between them, and also, covariances seem to differ. This suggests that either an unstructured or ARH model 
might fit this data best. Sec Table [5] for the comparison of model selection criteria such as AIC, BIC, and 
Log-likclihood and likelihood ratio test p- value. 

The most promising model is the unstructured model based on likelihood ratio test, since -2 Log Likelihood 
scores are significantly smaller than the -2 Log Likelihood scores of other models. However, BIC and AIC favor 
the model with the AR variance-covariance structure. Besides, the log-likelihood approach gives the second 
smallest -2 Log Likelihood score for this model. Hence, we choose the model with AR Var-Cov structure. 
The corresponding model is 

di jk i =fi + af + af +al + afP + af k T + af T + a?° T + e ijH , (8) 

where dijki is the distance for subject / for side i (I for left; 2 for right) with diagnosis j (j = 1 for CDRO; 2 
for CDRO. 5) at timepoint k (fc = I for baseline; 2 for follow-up), \i is the overall mean, af is the effect of side 
level i, af is the effect of diagnosis level j, a^ is the effect of timepoint level k, af® is the side- by-diagnosis 
interaction, af^ is the side-by-timepoint interaction, af? is the diagnosis-by-timepoint interaction, afjf T 
is the side-by-diagnosis-by-timepoint interaction, and e^u is the error term. The Var-Cov structure for the 
error term is 



COV {£ ij kl, Ei'jk' l ) 



op o 
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The three way interaction of side-by-group-by-timepoint is not significant (F = 0.50, df = 1, 168, p = 0.4823), 
and neither are the two way side-by-group (F = 0.76, df = 1, 168, p = 0.3860), and side-by-timepoint 
interactions (F — 2.25, df = 1,168, p = 0.1359). On the other hand, the group-by-timepoint interaction 
is significant (F = 8.47, df = 1,168, p = 0.0041). The main effects of side, group, and timepoint are all 
significant (F = 6.12, df = 1,168, p = 0.0143; F = 4.05, df = 1,168, p = 0.0457; and F = 19.52, df = 
1, 168, p < 0.0001, respectively), but due to interaction, the main effect for diagnosis (i.e., group) is close to 
clinically meaningless; i.e., the groups should be compared at each time point instead of an overall comparison 
of group means. But, the main effects of timepoint and side being significant is interpretable between baseline 
and follow-up. 

Below we perform various post-hoc tests to see which groups are significantly different or significantly 
change over time. To accomplish this, wc test for differences at each timepoint, between baseline and follow- 
up, and between left and right distances. 



3.4 Post-Hoc Comparison of LDDMM Distances of CDRO. 5 vs CDRO Hip- 
pocampi 

For the p- values regarding the comparison of independent groups, see Table [7] The significant values at a = 
0.05 are marked with *. None of the distance groups deviate significantly from normality (all p- values greater 
than 0.10). That is, distance distribution of each group can be assumed to come from a Gaussian distribution. 
Moreover, LB-CDR0.5 and LB-CDR0 distances can be assumed to have equal variances (p = 0.2948), and so 
can RB-CDR0.5 and RB-CDR0 (p = 0.2273). But, the variance of LF-CDR0 distances is significantly smaller 
than that of LF-CDR0.5 distances (p = 0.0294), and similarly for RF-CDR0 versus RF-CDR0.5 (p = 0.0262). 
Therefore, for comparisons at baseline, we can use the p-values from the i-tests [68], while for follow-up 
comparisons, it is more appropriate to use the p- values from Wilcoxon rank sum tests [67]. 

Observe that RF-CDR0.5 mean distances are significantly larger than RF-CDR0 mean distances at .05 
level (p — 0.0106), and LF-CDR0.5 distances are significantly larger than LF-CDR0 distances at 0.10 level 
(p = 0.0813). On the other hand, LB-CDR0.5 and LB-CDR0 distances are not significantly different (p = 
0.5362), and likewise for RB-CDR0.5 and RB-CDR0 distances (p = 0.8176). This implies that at baseline, 
the morphomctric differences of CDRO. 5 and CDRO hippocampi with respect to the template hippocampus 
are about same, which might indicate no significant shape differences in the left and right hippocampi due 
to dementia. However, since the metric distances do not necessarily provide direction in cither shape or 



size, this is not a decisive implication. At follow-up, left and right hippocampi of CDR0.5 subjects tend 
to significantly differ in morphometry from the template compared to those of CDRO subjects. Moreover, 
this significance emanates over time; that is, right hippocampi of CDRO. 5 subjects tend to undergo more 
alteration in morphometry compared to those of CDRO subjects over time. 

3.5 Comparison of Baseline and Follow-up Metric Distances 

For the comparison of dependent groups by paired difference method, see Table The paired differences 
in Table [7] can all be assumed to be normal based on Lilliefor's test of normality. Hence, we use the more 
powerful t— test for paired differences [68] . 

Observe that LB-CDR0.5 metric distances are significantly smaller than LF-CDR0.5 distances at a = .05 
(p = 0.0259). Likewise for RB-CDRO vs RF-CDR0.5 distances (p = 0.0002). That is, CDR0.5 hippocampi 
tend to become more different in morphometry from the template, which implies that for both left and 
right distances there is significant change in morphometry (perhaps reduction in size) of CDRO. 5 hippocampi 
over time. In fact, significant volume reduction over time is detected [29]. The morphometric changes in 
CDRO. 5 right hippocampi from baseline to follow-up is barely significantly larger than those of CDRO. 5 
left hippocampi (p = 0.0445). The associated p- value here is obtained by testing the difference sets (LB- 
CDR0.5)-(LF-CDR0.5) versus (RB-CDR0)-(RF-CDR0.5) using the usual paired i-test. On the other hand, 
only RB-CDRO is almost significantly less than RF-CDR0 at .05 level (p = 0.0621), which implies there is 
not strong evidence for shape change in control subjects over time, but some weak evidence for mild change 
in right hippocampi can be detected as a result of aging. Furthermore, the morphometric changes in CDRO 
right hippocampi from baseline to follow-up arc not significantly different from those of CDRO left hippocampi 
(p = 0.3817). 

The morphometric changes in CDRO. 5 left hippocampi from baseline to follow-up are not significantly 
different from those of CDRO left hippocampi (p — 0.1337), while the morphometric changes in CDRO. 5 
right hippocampi from baseline to follow-up are significantly larger from those of CDRO right hippocampi 
(p = 0.0074). Therefore, over time, DAT influences the morphometry of right hippocampi more compared to 
left hippocampi. 

3.6 Comparison of LDDMM Distances of Left and Right Hippocampi 

As for left vs right comparisons, LB-CDR0.5 and RB-CDRO. 5 distances are not significantly different from 
each other (p — 0.3046), LF-CDR0.5 distances are significantly smaller than RF-CDR0.5 distances at .05 
level = 0.0179), the same holds for LB-CDR0 vs RB-CDRO. 5 (p = 0.0215) and LF-CDR0 vs RF-CDR0 
(p = 0.0021) comparisons. This implies that at baseline morphometric differences of CDRO. 5 left hippocampi 
from the left template are about the same as those of CDRO. 5 right hippocampi from the right template. 
On the other hand at follow-up, morphometric differences of CDRO. 5 left hippocampi are smaller than those 
of CDRO. 5 right hippocampi. At baseline and follow-up, morphometric differences of CDRO left hippocampi 
from the left template are smaller than those of CDRO right hippocampi. That is, CDRO left hippocampi are 
more similar in morphometry to the left template when compared to CDRO right hippocampi to the right 
template. These distance comparisons for left versus right hippocampi would imply left-right morphometric 
asymmetry, only if the left and right hippocampi of the template subject were very similar (up to a reflection). 
Otherwise, these comparisons are only suggestive of morphometric differences from the respective hemisphere 
(side) of the hippocampi. 

3.7 Analysis of the Correlation between Metric Distances of Dependent Hip- 
pocampi 

Correlation coefficients between metric distances for baseline and follow-up (overall and by group) and for the 
left and right hippocampi are provided in Tableland Tabled respectively, where Pearson's product-moment 
correlation coefficient is denoted as rp, Spearman's rank correlation coefficient is denoted as ps, and Kendall's 
rank correlation coefficient is denoted as tk [67, 68]. The corresponding null and alternative hypotheses are 



H a : correlation = vs H a : correlation > 0. 

The values in the parentheses right of the correlation coefficients are the corresponding p-values. The 
significant p- values at level a — 0.05 arc marked with an asterisk (*). Since all groups can be assumed to be 
normal, the more powerful Pearson's correlation test will be used for inference. 

Notice that from the correlation analysis of baseline vs follow-up, we see that the overall distances, L- 
CDRO, and R-CDRO are significantly correlated at 0.05 level. But LB-CDR0 and LF-CDR0 are significantly 
correlated at .05 level based on Pearson's test, and Spearman's test, and at 0.10 by Kendall's tests. However, 
RB-CDR0 and RF-CDR0 are significantly correlated at 0.10 level by only Pearson's test. This implies that 
except for the CDR0 right hippocampi, the distances tend to increase at baseline together with distances at 
follow-up. That is, as the morphomctric differences from the template hippocampus increase at baseline, so 
do the differences from the template at follow-up (except for CDR0 right hippocampi). 

Notice also that from the correlation analysis of left and right distances, we observe that overall left and 
right at baseline (LDB and RDB) distances are significantly correlated at .05 level, but LDF and RDF are 
correlated at .05 level by Pearson's test only, and 0.10 by Spearman's test. And LB-CDR0 and RB-CDR0 have 
significant correlation structure at .05 level. However, the correlation coefficients are not that large, which 
suggests mild correlation between left and right metric distances. That is, as the morphomctric differences of 
left hippocampi from the left template increase, differences of right hippocampi from the right template tend 
to increase slightly. 

3.8 Comparison of Distributions of the Metric Distances 

The samples (groups) should be independent for these tests to be valid, so we only compare LB-CDR0.5 
vs LB-CDR0, RB-CDR0.5 vs RB-CDR0, LF-CDR0.5 vs LF-CDR0, and RF-CDR0.5 vs RF-CDR0. The 
corresponding p- values for the two-sided and one-sided cdf comparison tests arc provided in the Table [TU1 
where pxs is the p-value for the two-sided K-S test, with (1) and (g) are abbreviations of first cdf less than 
the second and first cdf greater than the second, respectively, pc is the p- value for Cramer's test, and pcvM 
is the p- value for Cramer- von Mises test [65, 72]. 

Notice that at a = 0.05 level, the cdf of RF-CDR0.5 distances is significantly smaller than the cdf of 
RF-CDR0 distances (p = 0.0259 for K-S test). That is, RF-CDR0.5 metric distances are stochastically larger 
than RF-CDR0 right metric distances. In other words, RF-CDR0.5 hippocampus shapes are more likely to 
be different than the template hippocampus compared to RF-CDR0 hippocampus shapes. Furthermore, the 
cdf of LF-CDR0.5 distances is significantly smaller than the cdf of LF-CDR0 distances (p = 0.0604 by K-S 
test and p = 0.0495 by Cramer's test); that is, LF-CDR0.5 metric distances are stochastically larger than 
LF-CDR0 metric distances. See Figure [7] for the corresponding cdf plots. Observe that these results are in 
agreement with the ones in Table [7J 

3.9 Logistic Discrimination with Metric Distances 

We model the probability that the subject has CDR0.5 given the hippocampal LDDMM distances of the 
subject for left and right hippocampi at baseline and follow-up. First we consider the full logistic model 
(designated as Mj(D)) with the response logit p = log[p/(l — p)) where p = P(Y = 1) (i.e., the probability 
that condition of the subject is CDR0.5); side, timcpoint, and distance with all possible interactions are 
the predictor variables. When the stepwise model selection procedure is applied, the resulting model is 
logit pk — Pa + Pidijk where pk is the probability of subject k having DAT and dij k the distance for subject 
k with diagnosis i (i = 1 for CDR0 and 2 for CDR0.5) at timepoint j (j = 1 for baseline and 2 for follow-up), 
Po is the intercept and Pi is the slope of the fitted line. However, the graph of the proportions of CDR0.5 
subjects for grouped metric distances in Figure [8] suggests that the relationship is a quadratic one (in fact, 
we found that the higher order distance terms are not significant). That is, the analysis of deviance table 
indicates that only the linear and quadratic terms are significant (p = 0.001 and p = 0.010). So the resulting 
model is 

M H (D): logit Pk = (3 Q + pid ijk + p 2 d% k (9) 
where P2 is the coefficient of the quadratic term. 



Using this logistic classifier with p Q = 0.5, we obtain classification summary matrix A in Table [TT] for the 
176 hippocampi MM images in this data set. The labels on the left margin show the groups to which the 
hippocampi MRIs arc classified into, while the top margin shows the groups from which these MRI images 
come. Observe that 95 out of 104 (91%) of the hippocampus MRIs from CDRO subjects would be classified 
correctly and 20 out of 72 (28%) of the hippocampus MRIs from CDRO. 5 subjects are classified correctly. 
However, in this logistic discrimination procedure, we treat each hippocampus from left, right, baseline or 
follow-up hippocampi as a distinct subject. From a clinical point of view, each subject has four hippocampus 
MRIs in this study, and one MRI classified as CDRO. 5 would suffice to classify the subject as CDRO. 5, while 
all four MRIs should be classified as CDRO for the subject to be classified as CDRO. With this classification 
rule, we obtain the classification matrix B in Table [TT1 Notice that 18 out of 26 (69.2%) of the CDRO subjects 
would be classified correctly and that 10 out of 18 (55.6%) of the CDRO. 5 subjects are classified correctly. 

However, as we have seen in Section 3.1.3, due to group-by-timepoint interaction, we need to consider 
diagnosis groups at each time point. When we use df k and df k one at a time in a logistic model, we see that 
only the model 

Miu(D): logit p k = (3 + Mik + {d F k f (10) 

has significant coefficients for the distance terms. Using this logistic model in the logistic classifier, we get the 
classification matrix C in Table [TT] Notice that 22 out of 26 (85%) of the CDRO subjects would be classified 
correctly and that 10 out of 18 (56%) of the CDRO. 5 subjects are classified correctly. 

Moreover, when we use d k B , d k F , d k B , and d k F one at a time in a logistic model, we see that only the 
model 

M IV {D): logit p k = /? + Pidf F (11) 

has a significant coefficient for the distance term. Using d k F and this logistic model in the logistic classifier, 
we get the classification matrix D in Table [TT1 Notice that 22 out of 26 (85%) of the CDRO subjects would 
be classified correctly and that 8 out of 18 (44%) of the CDRO. 5 subjects are classified correctly. The above 
classification matrices are almost the same with leave-one-out cross-validation with logistic discrimination 
(not presented). 

We also calculate the sensitivity and specificity of the classification procedures summarized in Table [T^J 
Sensitivity is the proportion of subjects that are classified to be CDRO. 5 (i.e., positive) of all CDRO. 5 subjects. 

T 

That is, sensitivity is defined as P sen s = — x 100 % where Tcdro.5 is the number of correctly classified 

Ncdro.5 

CDRO. 5 subjects and Ncdro.5 is the total number of CDRO. 5 subjects in the data set (i.e., Ncdro.5 = 18 
in our data). Notice that the higher the sensitivity, the fewer real cases of DAT go undetected. Specificity is 
the proportion of subjects that are classified CDRO (i.e., negative, control, or healthy) of all CDRO subjects; 

that is P sp ec = -77 x 100 % where Tcdro is the number of correctly classified CDRO subjects and Ncdro 

Ncdro 

is the total number of CDRO subjects in the data set (i.e., Ncdro = 26 in our data). Notice that the 
higher the specificity, the fewer healthy people are labeled as sick. The correct classification rates, sensitivity 
and specificity percentages for the classification matrices A-D are presented in Table [12] Observe that best 
classification performance is with the logistic model Muj(D) in Equation (10) with one CDR0.5-labelcd 
hippocampus enough to label the subject to have CDRO. 5 (see matrix C in Table [TTj) . Furthermore, in these 
classification procedures, specificity rates are (significantly) larger than the sensitivity rates. 

We could also change the threshold probability p in Equation (5). The correct classification rates, 
sensitivity, and specificity percentages with models Mj(D) — Mjy(D) and p G {1/2, 18/44} are presented in 
Table [T3l Observe that with p = 1/2 the best classifier is based on Mjjj(D) and with p a = 18/44 the best 
classifier is based on Mjv(D). Setting p D = 18/44 (the proportion of CDRO. 5 subjects in the data set) we 
get higher sensitivity rates than those with p D = 1/2. However, as p decreases, the correct classification rate 
and specificity tend to decrease. 

One can optimize the threshold value of p Q in Equation (5) to maximize the correct classification rates 
and minimize the misclassification rates using an appropriately chosen cost function. For example one can 
consider the cost function 

C 1 (p ,w 1 ,w 2 ) = - (Tcdro - F C dro) Wi (T C dro.5 - Fcdro.s)™ 2 , (12) 

where w\ < are positive odd numbers, Fcdro is the number of CDRO. 5 subjects classified (falsely) as 
CDRO and Fcdro.5 is the number of CDRO subjects classified (falsely) as CDRO. 5. Notice that minimizing 



this cost function will maximize the correct classification rates and minimize the misclassification rates. The 
correct classification rates, sensitivity and specificity rates are provided in Table [T3l Using wi = W2 = 1, 
optimal threshold values are p a = 0.5 for model Mjj(D) in Equation (9), p = 0.45 for model Mju(D) in 
Equation (10), and optimal p Q = 0.38 for model Mjy(D) in Equation (11). The specificity rates are 69%, 
73%, and 69%, respectively. The sensitivity rates are 56%, 67%, and 72%, respectively. Obviously, from a 
clinical point of view, misclassifying a CDR0.5 subject as CDRO (i.e., classifying a diseased subject as healthy) 
might be less desirable, since a subject labeled as CDRO. 5 will undergo further screening but a subject labeled 
as CDRO will be released. So the parameters w\ and W2 could be modified to reflect such practical concerns 
and then a different set of threshold p a values could be found. For example, we set w± = I and w% = 3 
which favors correct classification of CDRO. 5 subjects more than that of CDRO subjects (i.e., favors higher 
sensitivity). Observe that with W\ = w% = 1 the best classifier is based on model Miy(D) and with W\ = 1 
and u>2 = 3 the best classifier is based on model Mm(D). 

Alternatively we can maximize the sensitivity and specificity rates by minimizing the following cost func- 
tion 

(TcDn.O — FcDRO) . {TcDR0.5 — FCDR0.5 



Vi 77 1- m - 



(13) 



NcDRO NcDR0.5 

where 771,772 > and 771+772 = 1. Notice that as either of sensitivity or specificity increases, the cost 
function C-zipo, 771, 772) decreases. With 771 = 772 = 0.5 the best classifier is based on model Miv{D) and 
with 771 = .3, 772 = 0.7 the best classifier is based on model Mm(D). Observe that from 771 = 772 = 0.5 to 
771 = .3, 772 = 0.7, sensitivity increases, correct classification rate and specificity tend to decrease. 



4 Analysis of Hippocampal Volumes 

The LDDMM distance gives one number reflecting the global size and shape. Volume measurements were 
presented in detail in [29]. The LB-CDR0 subjects had an average hippocampal volume of 2081 ( ± 354) mm 3 
while RB-CDR0 subjects had 2600 ( ± 481) mm 3 . The LB-CDR0.5 subjects had an average hippocampal 
volume of 1717 ( ± 224) mm 3 and RB-CDR0.5 had 2186 ( ± 370) mm 3 . On the other hand, LF-CDR0 
subjects showed a volume reduction of 82 mm 3 (4.0%, NS) and RF-CDR0 subjects showed a reduction 
ofl42 mm 3 (5.5%, NS) where NS stands for "not significant". LF-CDR0.5 subjects had hippocampal volume 
reduction of 164 mm 3 (8.3%, p = 0.03) and RF-CDR0.5 subjects had reduction of 236 mm 3 (10.2%, p = 0.05) 
on the right side. Repeated-measures ANOVA showed both significant change over time (within group, 
F = 98.97, df = 1,42, p < .0001) and significant time-group interaction (F = 7.81, df = 1,42, p = 0.0078) 
in the hippocampal volumes. The time-group interaction persisted when covaried with baseline total cerebral 
brain volume (p = 0.0066) or with baseline total intracranial volume (p = 0.0077). In order to take into 
account variations in betwecn-visit intervals (mean 2.11 ± 0.47 years), scan interval was also used as a 
covariate in the volumes comparison. Again, the significant time x group interaction after covarying for scan 
intervals (p = 0.015) persisted. 



4.1 Repeated Measures Analysis of Hippocampal Volumes 

We repeat the same modeling procedure of Section 3.1 on the hippocampal volumes. For modeling hip- 
pocampal volumes using the repeated measures ANOVA with group as main effect and compound symmetry 
in Var-Cov structure and volume measurements repeated over time for each subject, the model is 

V ijk =/j, + aP + af + a° T + e ijk , (14) 

where Vijk is the volume for subject k with diagnosis i at timepoint j, [i is the overall mean, af is the effect of 
diagnosis level i, aj is the effect of timepoint level j, ct® T is the diagnosis-by-timepoint interaction, i.e., part of 
the mean volume not attributable to the additive effect of diagnosis and timepoint, and £j,-/- is the error term. 
Notice that the effect of side (left or right) is ignored in this model. There is significant group main effect (F — 
17.54, df = 1,42, p = 0.0001) and within group time-point main effect (F = 9.87, df = 1, 130, p = 0.0021) 
but the group-by-timcpoint interaction is not significant (F = 0.84, df = 1,130, p = 0.3624). This implies 
that the main effect of group comparison is meaningful and about the same at each timepoint. Moreover, the 
groups do change in morphometry over time. 



For modeling volumes using the repeated measures ANOVA with side as main effect and compound 
symmetry in Var-Cov structure and volume measurements repeated over time, the corresponding model is 

Vijk = /x + o;f + aj + afj + eijk, (15) 

where V%jk is the volume for subject k for side i (i = 1 for left; 2 for right) at timcpoint j, \i is the overall 
mean, af is the effect of side level i, aj is the effect of timepoint level j = 1,2, af T is the side-by-timepoint 
interaction, and e^k is the error term. Notice that the effect of diagnosis (CDRO or CDR0.5) is ignored in 
this model. The side and timcpoint main effects are both significant (F = 377.21, df = 1,129, p < .0001 
and F = 38.31, df = 1,129, p < .0001, respectively), but side-by-timepoint interaction is not significant 
(F = 1.84, df = 1, 129, p = 0.1769). Consequently, we conclude that the lines that join mean volumes in the 
interaction plot are parallel and far apart, the main effect of side comparison is meaningful, and about the 
same at each timepoint. Moreover, the left and right hippocampi do change in morphometry over time. 

For the model that includes the diagnosis, side, and diagnosis-by-side interaction, we use the same model 
selection criteria in Section 3.1.3. We find that the most promising model based on likelihood ratio test, BIC, 
and AIC is the one with unstructured Var-Cov matrix. The corresponding model with significant terms at 
a = .05 level is 

V m = n + af + af + al + af k T + af k T + s m , (16) 

where Vijki is the volume for subject I for side i with diagnosis j at timepoint k, fj, is the overall mean, af is 
the effect of side level i, af is the effect of diagnosis level j, aj. is the effect of timepoint level k = 1,3, af k T is 
the side-by-timepoint interaction, afjF is the diagnosis-by-timepoint interaction, and Sijki is the error term. 
The (unstructured) Var-Cov structure for the error term is 

<J 2 

a 2i 2 
031 032 2 

041 042 043 2 

The main effects of side, group, and timepoint are all significant (F — 120.10, df = 1,170, p < .0001; 
F = 25.25, df = 1,170, p < .0001; and F = 89.53, df = 1,170, p < 0.0001, respectively). But due to 
interaction, the main effect for diagnosis (i.e., group) is close to clinically meaningless, i.e., the group means 
should be compared at each time point or hemisphere instead of comparing the overall means of the groups. 
But, the main effects of group and side being significant are interpretable between baseline and follow-up. 

4.2 Post-Hoc Comparison of Hippocampal Volumes for Differences in Group, 
Time, and Hemisphere 

We repeat the analysis procedure of Section 3 on hippocampal volumes also. We find that left hippocampus 
volumes are significantly smaller than the right hippocampus volumes at both baseline and follow-up years 
(i.e., there is significant volumetric left-right asymmetry in hippocampi); baseline volumes are larger than 
follow-up volumes for both left and right hippocampi (i.e., there is significant reduction in volume by time) 
(p < .0001 for each comparison). The means and standard deviations of the volumes for left and right 
hippocampi of each group are provided in Table [T] We observe the same trend in the overall comparison for 
each group also. However, left-right volumetric asymmetry significantly reduces by time in CDRO. 5 group 
(p = .0407); but the same holds only barely in CDRO group (p = .0524). The level of left-right volumetric 
asymmetry is about the same in both CDRO and CDRO. 5 groups at baseline (p = .3495) and follow-up 
(p = .4853). The volumes decrease significantly by time in CDRO group for both left and right hippocampi 
(p < .0001 for both); the same holds for CDRO. 5 group also (p = .0001 for both). The volumetric reduction 
is significantly larger in CDRO. 5 right hippocampi compared to CDRO. 5 left hippocampi (p = .0407); but 
the same holds only barely in CDRO group (p = .0524). On the other hand, the volumetric reduction is 
significantly larger in CDRO. 5 left hippocampi compared to CDRO left hippocampi (p = .0108); the same 
holds for right hippocampi also (p = .0418). The variances of volumes are not significantly different for 
(LB-CDR0.5, LB-CDR0), (RB-CDR0.5, RB-CDR0), and (RF-CDR0.5, RF-CDR0) groups, but volumes of 
LF-CDR0 hippocampi are significantly larger than LF-CDR0.5 hippocampi (p = .0268). The CDRO. 5 volumes 
are significantly smaller than CDRO volumes in left hippocampi at baseline (p = .0001) and follow-up (p < 
.0001), and for right hippocampi at baseline (p — .0071) and follow-up (p = .0001). The CDRO. 5 volumes 



Cov(eijki,£i'jk'l) = 



are stochastically smaller than CDRO volumes for left hippocampi at baseline (p = .0007) and follow-up 
(p = .0003), and for right hippocampi at baseline (p = .0064) and follow-up (p = .0028). 



4.3 Logistic Discrimination with Hippocampal Volumes 

We apply the logistic discrimination methods of Section 3.7 on hippocampal volumes. First we consider the 
full logistic model (designated as Mi(V)) with side, timcpoint, and volume with all possible interactions 
being the predictor variables. We apply the same stepwise elimination procedure as in Section 3.7 and get 
the following reduced model: 

M U {V): logit Pl = f3 a + of + PiVyu, (17) 

where p\ is the probability of subject I having DAT and Vijki the volume for subject I with diagnosis i ( i = 1 
for CDRO and 2 for CDRO. 5) at timcpoint j (j = 1 for baseline and 2 for follow-up) with side k (k = 1 for 
left and 2 for right), {3 is the overall intercept, is the effect of side level k, and fi\ is the slope of the fitted 
line. 

However, as we have seen in Section 3.1.3, due to group- by-timepoint interaction, we need to consider 
diagnosis groups at each time point. When we use Vj^ and VK, one at a time in a logistic model, we see that 
the model 

M IU (V): logit Pl = /3 + a s k + fcV& (18) 
has the most significant coefficients for the volume terms. 

Moreover, when we use V^ B , V kl F , V3 B , and V^ F one at a time in a logistic model, we see that the 
following model has the best fit. 

M IV (V): logit Pl = fa + fcV$ F (19) 

The classification rates are presented in Table [TBI Observe that with p Q = 1/2 the best classifier is based 
on model Mm(V) and with p Q = 18/44 the best classifier is based on model Miv{V). Furthermore, as p 
decreases from 1/2, sensitivity increases but the correct classification rate and specificity decreases. We use 
the cost function Ci(p a , Wi, W2) with w\ = W2 = 1 and with w\ = 1 and u>2 = 3 to calculate the optimal p Q 
values for each of the models Mj(V) — Mjv(V). Observe that with wi = w-z = 1 the best classifier is based 
on model MiyiV) and with w\ = 1 and W2 = 3 the best classifier is based on model Mjjj(V). We find the 
optimal p values based on the cost function Ci(j> -, ?7i, ^2) with r\\ = 7/2 = 0.5 and with r/i = .3, r/2 = 0.7 
for each of models Mi{V) — Mjv{V). With 771 = rj 2 = 0.5 the best classifier is based on model Miv{V) and 
with 771 = .3, 772 = 0.7 the best classifier is based on model Mi(V). Observe that from 771 = 772 = 0.5 to 
?7i = .3, 772 = 0.7, sensitivity increases, correct classification rate and specificity tend to decrease. 



5 Comparison of Hippocampal Volumes and Metric Distances 

Although volume is a measure of size and metric distance is a measure of overall morphometric difference 
from a template, the repeated measure analysis and post-hoc analysis of volumes and metric distances provide 
similar results. The main difference is that volumes tend to decrease, while LDDMM distances tend to increase 
by time. 

The logistic discrimination models are similar, except model M/y(Z?) for metric distances contains right 
follow-up distances, while model Mjy(V) for volumes contains left follow-up volumes. The classification 
performances with p a = 1/2 and p = 18/44 suggest that volume models have better performance than 
the metric distance models (see Tables [13] and [15]). Using the optimal p values with the cost functions 
C\(p , u>i, 1U2) and 6*2(^0, 771, 772), the classification performances are significantly different for models Mi{V) — 
Mjv(V) of volumes and Mi(D) — Mjy(D) metric distances. Comparing Tables [TBI and [T51 we see that logistic 
discrimination with volumes has better performance. 

We apply the logistic discrimination using both volume and metric distance as predictors. The models we 
consider are the full logistic model (designated as model Mi(V, D)) with side, timepoint, volume, and metric 
distances with all possible interactions being predictor variables. We apply the same stepwise elimination 
procedure as in Section 3.7 and get 

Mu(V, D): logit pi = fa + af + piV m + fodlki + >i\ \,u<l,ju 



where pi is the probability of subject I having DAT and Vijki the volume and dijki the distance for subject / 
with diagnosis i (i = 1 for CDRO and 2 for CDR0.5) at timepoint j (j = 1 for baseline and 2 for follow-up) 
with side k (k = 1 for left and 2 for right), (3q is the overall intercept, af is the effect of side level k, j3\ is 
the coefficient for volume, fa is the coefficient for ninth power of the distance, fa is the coefficient for the 
interaction between volume and distance. When we use baseline or follow-up measures one at a time in a 
logistic model, we see that the model 

M IU (V, D): logit Pl = fa + af + /3 1 V£ l + fa {df u f 

has the most significant coefficients. When we use side-by-timcpoint combinations one at a time in a logistic 
model, we see that the following model has the best fit: 

M IV (V, D): logit Vl = fa + fa {d( kl ) 3 + f3 2 V k L l F . 

The corresponding classification rates are presented in Table 1161 Observe that considering metric distance 
and volume together in the logistic discrimination procedure with the cost functions Ci(p , w±, W2) and 
C2 (po , 771 , r\2 ) , we get better classification rates compared to logistic models with only one of metric distance 
or volume being the predictors. 



6 Annual Percentage Rates of Change in Hippocampal Volumes 
and Metric Distances 

Our volume and LDDMM metric comparisons are cross-sectional or longitudinal by construction. However 
these measures might need to be adjusted for anatomic variability, since intersubject variability might add 
substantial amount of noise to volume or distance measurements at baseline or follow-up. There is no simple 
way to correct for this noise in practice. Differential volume loss or distance change over time might be 
self-correcting for such variability. For example, entorhinal cortex volume loss over time was shown to be a 
better indicator for DAT than cross-sectional measurements [73] . 



6.1 Annual Percentage Rate of Change in Hippocampal Volume 

The hippocampal volume change over time can be written as the following annual percentage rate of change 
(APC) [73]: 

yAPC = V k ~ V k 10Q % (20) 

where T is the interscan interval in years (T w 2 in our data). 

For modeling annual percentage rate of change in volume V APC using the repeated measures ANOVA 
with group as main effect and compound symmetry in Var-Cov structure and V APC measures repeated over 
side for each subject, the model is 

V4 k pc = li + a? + af + a° s + s m , (21) 

where V^ PC is the APC in volume for side j of subject k with diagnosis i, \i is the overall mean, af is the 
effect of diagnosis level i (i = 1 for CDRO; 2 for CDRO. 5), a^j is the effect of side level j {j = 1 for left and 
2 for right), a® s is the diagnosis-by-side interaction, and ey*. is the error term. The diagnosis main effect is 
significant (F = 18.62, df = 1,84, p < .0001) but neither side main effect (F = 0.72, df = 1,84, p = .3754) 
nor diagnosis-by-side interaction is significant (F = 0.11, df = 1, 84, p = 0.7384). Consequently, we conclude 
that the lines that join the mean V APC values in the interaction plot are parallel and far apart, the main effect 
of diagnosis comparison is meaningful, and about the same at each hemisphere. The post hoc comparison of 
yAPC values indicate that the APC in CDRO. 5 volumes arc significantly larger than APC in CDRO volumes 
(p = .0001) . 

We apply the logistic discrimination methods of Section 3.7 on APC in hippocampal volumes. First we 
consider the full logistic model (designated as Mj (V APC )) with side and APC in volume with all possible 



interactions being the predictor variables. We apply the same stepwise elimination procedure as in Section 
3.7 and get 

Mn (V APC ) : logit p k = (3 + W£ pc + & {V APC f (22) 
where pk is the probability of subject k having DAT. 

APC L APC R 

Furthermore, when we use V^ k ' and V^ k ' as predictors in a logistic model, we see that the following 
model has the best fit. 

Mm (V APC ) : logit Pk = O + f3 lV APC ' L + f3 2 V APC > R + fa (v APC ' L ) ' . (23) 

The classification rates with p Q = 1/2 and p Q = 18/44 and optimal p Q values with respect to the cost 
functions are presented in Table [TT1 Observe that the classifier using the cost function Ciipo, 771, 772) with 
771 = .3, r]2 = 0.7 in model Mj (y APC ^ has the best performance. Comparing Tables IT51 and [TBI we observe 
that correct classification rates, sensitivity, and specificity percentages with the classifiers based on APC in 
volume are about the same as those with volume only. Unlike the findings of [73] hippocampal volume loss 
over time is not a better indicator for DAT than cross-sectional measurements. On the other hand, the 
classifier based on volume and distance together performs better compared to models based on only one of 
volume, distance, or APC in volume values. 



Df PC = n + o? + af + a° s + e ijk , (25) 



6.2 Annual Percentage Rate of Change in LDDMM Metric Distance 

The hippocampal LDDMM metric distance change over time can be written as the following annual percentage 
rate of change: 

d apc = ^_zA 10Q% (24) 

Notice that to make APC in metric distance positive, we take the difference dt — d b k as opposed to the order 
in the APC in volume definition For modeling D APC using the repeated measures ANOVA with group as 
main effect and compound symmetry in the Var-Cov structure and D APC measures repeated over side for 
each subject, the model is 

Tj ijk = M 1- «j t- «j 

where is the APC in LDDMM metric distance for side j of subject k with diagnosis i. The other 

terms are as in Equation (21). The diagnosis main effect is significant (F = 4.75, df = 1,84, p = .0320) 
but neither side main effect (F — 2.29, df = 1,84, p = .1338) nor diagnosis-by-side interaction is significant 
(F = 0.87, df = 1,84, p = 0.3532). Consequently, we conclude that the lines that join the means in the 
interaction plot are parallel and far apart, the main effect of diagnosis comparison is meaningful, and about 
the same at each hemisphere. The APC in CDR0.5 LDDMM distances is significantly larger (in absolute 
value) than APC in CDR0 distances (p = .0036) . 

We apply the logistic discrimination methods of Section 3.7 on APC in hippocampal LDDMM distances. 
First we consider the full logistic model (designated as Mj (D APC ^)) with side and APC in distance with all 
possible interactions being the predictor variables. We apply the same stepwise elimination procedure as in 
Section 3.7 and get 

M n (D APC ) : logit Pk = O + ^D APC . (26) 

APC L APC R • • • • 

Furthermore, when we use ' and F> i - k ' as predictors in a logistic model, we see that the following 

model has the best fit. 

Mm (D APC ) : logit Pk = *3 + ^D^ a ' R . (27) 

The classification rates with p Q — 1/2 and p Q = 18/44 and optimal p a values with respect to the cost func- 
tions arc presented in Table 1181 Observe that these classifiers have relatively poor performance. Comparing 
each of Tables [TS] and [TH we observe that the classifiers with the optimal p values have much larger sensitivity 
rates but this increase comes at the expense of substantial decrease in correct classification rate and specificity. 
Hence hippocampal LDDMM change over time is not a better indicator for DAT than cross-sectional distance 
comparisons. 



6.3 Annual Percentage Rate of Change in Hippocampal Volume and Metric 
Distances 

We apply the logistic discrimination based on distance and APC in volumes. First we consider the full 
logistic model (designated as Mj (V APC , Z))) with side and APC in volume, and distances with all possible 
interactions being the predictor variables. We apply the same stepwise elimination procedure as in Section 
3.7 and get 

M U (V APC , D) : logit Pk = ft + (3 1 V APC + ft {V APC f + ft & . (28) 

Furthermore, when we use and left and right measures as predictors in a logistic model, we see that the 
following model has the best fit. 

Mm (V APC , D) : logit Pk = ft + frV APC ' L + fi 2 v£ PC > R + f3 3 d PF . (29) 

The classification rates with p a = 1/2 and p a = 18/44 and optimal p values with respect to the cost 
functions are presented in Table [T9l With the cost function Cx(p ,wi = 1,102 = 1); the best classifier is 
based on Mm (V APC , £>) for which the optimal threshold value is p ss .37, the correct classification rate 
is 80%, sensitivity is 78%, and specificity is 81%. Likewise, with the cost function Ci(p ,wi = l,u>2 = 3), 
the best classifier is based on Mj (V APC , D) for which the optimal threshold value is p a = .56, the correct 
classification rate is 80%, sensitivity is 78%, and specificity is 81%. On the other hand, with cost function 
C2{Po,Vi = -5,?72 = -5) the best classifier is based on Alf PC (V,D) for which the optimal threshold value is 
p = .64, the correct classification rate is 84%, sensitivity is 72%, and specificity is 92%. With cost function 
Ci{j> ,t}\ = -3, 772 = -7), the best classifier is again based on Mf PC (V,D) for which the optimal threshold 
value is p = .56, the correct classification rate is 80%, sensitivity is 78%, and specificity is 81%. Comparing 
Tables [TO] and 1191 we observe that the classifiers based on metric distance and volume usually perform better 
compared to the classifiers based on metric distance and APC in volume. Comparing Table [17] and [T9l we 
observe that adding the metric distance to the logistic model with APC in volume improves the classification 
performance. Hence the model with hippocampal volume loss over time and metric distance is a better 
indicator for DAT compared to either variable used separately in logistic discrimination. 

We also apply the logistic discrimination based on volume, distance, and APC in volumes. First we 
consider the full logistic model (designated as Mj (V, V APC , Z))) with side, volume, and APC in volume, and 
distances with all possible interactions being the predictor variables. We apply the same stepwise elimination 
procedure as in Section 3.7 and get 

M U (V, V APC , D) : logit p k = ft + fcV^ + (i 2 V APC + ft {V APC f + ft (d p ]k f . (30) 

Furthermore, when we use and left and right measures as predictors in a logistic model, we see that the 
following model has the best fit. 

Mm (V, V APC , D) : logit p k = ft + p lV PF + (3 2 dhf + f3 3 V* PC ' R + f3 4 (v£ pc > L ) ' + ft (dg F ) 3 . (31) 

The classification rates with p Q = 1/2 and p = 18/44 and optimal p values with respect to the cost 
functions are presented in Table |2"01 With the cost function Ci(p ,wi = 1,102 = 1), the best classifier is 
based on M in (V, V APC , D) . Comparing Table [H] with Tables fig ) \W\ [TTl [T51 [TBI and PT9 l we observe that 
the classifiers based on metric distance, volume, and APC in volumes usually perform better compared to 
the classifiers based on other models. Hence the model with volume, hippocampal volume loss over time, and 
metric distance is a better indicator for DAT compared to other models based on subsets of these variables. 



7 Discussion and Conclusions 

In this study, we used the Large Deformation Diffeomorphic Metric Mapping (LDDMM) algorithm to generate 
metric distances between hippocampi in groups of subjects with and without Dementia of Alzheimer's type 
(DAT) in its mild form (labeled as CDR0.5 and CDR0 patients, respectively) at baseline and at follow-up. 
The subjects in this paper have been previously analyzed using related but different tools. As a single scalar 



measure, volumes were used for diagnosis group comparisons at baseline and follow-up [29] and displacement 
momentum vector fields based on LDDMM were used for discrimination [57]. But the metric distances 
computed from LDDMM has not hitherto been used in diagnosis group analysis. The metric distance gives a 
single number reflecting the global morphometry (i.e., the size and shape) while volume measurements only 
provide information on size. So metric distances provide morphomctric information not conveyed by volume 
whereas momentum vector fields also provide local information on shape changes. Further, the morphomctric 
information conveyed by the metric distance depends on the choice of the template, while the morphometric 
information conveyed by momentum vector fields is independent of the template chosen. That is, although 
the vector fields change when the template changes, the morphometric information they convey is the same. 

Previously, it has been shown that hippocampal volume loss and shape deformities observed in subjects 
with DAT distinguished them from both elderly and younger control subjects [10] . The pattern of hippocampal 
deformities in subjects with DAT was largely symmetric and suggested damage to the CA1 hippocampal 
subfield [74]. Hippocampal shape changes were also observed in healthy elderly subjects, which distinguished 
them from healthy younger subjects. These shape changes occurred in a pattern distinct from the pattern 
seen in DAT and were not associated with substantial volume loss [75] . 

Furthermore, Wang et al. [45] analyzed the baseline hippocampi of the same data set and showed that the 
very mild DAT subjects showed significant inward variation in the left and right lateral zones (LZ) the left and 
right intermediate zone (IMZ), but not in the left and right superior zones (SZ) as compared to CDRO subjects. 
In their logistic regression analysis, inward variation of the left and right LZ or IMZ by 0.1 mm relative to 
the average of the nondemented subjects increased the odds of the subject being a very mild DAT subject 
rather than being a nondemented subject. The odds ratios for the left and right SZ were not significant. 
These results represented a replication of their previous findings [10] and suggest that inward deformities of 
the hippocampal surface in proximity to the CA1 subfield and subiculum can be used to distinguish subjects 
with very mild DAT from CDRO subjects [45]. However, although momentum vector fields obtained by the 
LDDMM algorithm can be used to detect such local (i.e., location specific) morphometric changes (as in the 
CAl subfield), metric distance does not provide such local information, hence fails to indicate any type of 
laterality differences. 

The main results in this paper are that although metric distances did not detect any significant difference 
in morphometry at baseline (sec Table [7]), follow-up metric distances for the right hippocampus in CDRO. 5 
(i.e., mildly demented) subjects are found to be significantly larger than those in CDRO (i.e., non-demented) 
subjects (see Table [7]). Wang et al. also analyzed the velocity vector fields for the baseline hippocampi of the 
same data set and found that the left hippocampus in the DAT group shows significant shape abnormality 
and the right hippocampus shows similar pattern of abnormality [57]. Again, the reason for the metric 
failing to detect such abnormality in the baseline hippocampi is that metric distance is a compound and 
oversummarizing measure of global morphometry From baseline to follow-up, metric distances for CDRO. 5 
subjects significantly increase while those in CDRO subjects do not (see Table [7]). That is, the morphometry 
(shape and size) of hippocampus in CDRO. 5 subjects changes significantly over time, but not in CDRO 
subjects. Atrophy - over two years - might occur with aging, and this is captured by metric distances (see 
Table [7]). However the increase in the metric distances in CDRO subjects is not found to be statistically 
significant. 

Such differences in morphometry between diagnosis groups or morphometric changes over time can be 
detected by metric distances computed via LDDMM and could potentially serve as a biomarker for the 
disease. Previously, the volumes and velocity vector fields associated with the same data set (i.e., baseline 
and follow-up of both groups) were also analyzed and it was found that hippocampal volume loss over time 
was significantly greater in the CDRO. 5 subjects (left = 8.3%, right = 10.2%) than in the CDRO subjects 
(left = 4.0%, right = 5.5%) (ANOVA, F = 7.81, p = 0.0078). Using singular- value decomposition and 
logistic regression models, [45] quantified hippocampal shape change across time within individuals, and this 
shape change in the CDRO. 5 and CDRO subjects was found to be significantly different (Wilks' A, p = 0.014). 
Further, at baseline, CDRO. 5 subjects, in comparison to CDRO subjects, showed inward deformation over 38% 
of the hippocampal surface; after 2 years this difference grew to 47%. Also, within the CDRO subjects, shape 
change between baseline and follow-up was largely confined to the head of the hippocampus and subiculum, 
while in the CDRO. 5 subjects, shape change involved the lateral body of the hippocampus as well as the head 
region and subiculum. These results suggest that different patterns of hippocampal shape change in time as 
well as different rates of hippocampal volume loss distinguish very mild DAT from healthy aging [29] . 



In regard to statistical analysis, as a compound but brief measure of morphometry, metric distances 
can thus serve as a first step to identify the morphometric differences, and can be used as a pointer to 
which direction a clinician or data analyst could go. The importance of the CDRO versus CDR0.5 contrast 
analyzed here is that it tests a necessary but not sufficient condition for the eventual goal of discriminating 
CDROs who subsequently convert to CDRO. 5, from CDROs who subsequently stay CDRO. As subjects are 
followed longitudinally and some convert, we have shown that cross-sectional measures of the hippocampal 
structure can be used to predict those who convert. Metric distances may also be used in this way. When 
baseline and followup of converted and nonconverted nondemented subjects were analyzed, it was found that 
the inward variation of the lateral zone and left hippocampal volume significantly predicted conversion to 
CDRO. 5 in separate Cox proportional hazards models. When hippocampal surface variation and volume were 
included in a single model, inward variation of the lateral zone of the left hippocampal surface was selected 
as the only significant predictor of conversion. The pattern of hippocampal surface deformation observed 
in nondemented subjects who later converted to CDRO. 5 was similar to the pattern of hippocampal surface 
deformation previously observed to discriminate subjects with very mild DAT and nondemented subjects. 
These results suggest that inward deformation of the left hippocampal surface in a zone corresponding to the 
CA1 subfield is an early predictor of the onset of DAT in nondemented elderly subjects [74]. This appears to 
contradict our finding that the morphometric changes in CDRO. 5 right hippocampi from baseline to follow- 
up is significantly larger than those of CDRO. 5 left hippocampi (p = 0.044). The morphometric changes in 
CDRO right hippocampi from baseline to follow-up are not significantly different from those of CDRO left 
hippocampi (p = 0.382). The morphometric changes in CDRO. 5 left hippocampi from baseline to follow-up 
are not significantly different from those in CDRO left hippocampi (p = 0.134), while the morphometric 
changes in CDRO. 5 right hippocampi from baseline to follow-up are significantly larger than those of CDRO 
right hippocampi (p = 0.007). Therefore, over time, DAT may alter the (global) morphometry of the right 
hippocampus. However, note that the finding in [74] are concerned with changes in (local) subregions of 
hippocampi, while metric distance is concerned with overall morphometric changes. That is, DAT might 
implicate CAl of the left hippocampi, yet the overall change in morphometry of right hippocampi might be 
more substantial. Moreover, in [74] the converted (from CDRO to CDRO. 5) subjects were analyzed separately, 
which we do not consider such conversion in our analysis. Also, metric distance results agree with the volume 
comparisons of [29], hence volume (i.e., scale) might be highly dominating the morphometric changes in the 
hippocampi. In other words, the significant volume reduction in left and right hippocampi might dominate 
the change in shape, when morphometry is measured by metric distances. To remove the size influence so as 
to measure the shapes only, one can perform scaling on the hippocampi and then apply LDDMM to normalize 
the size differences. 

Differences and changes (over time) in morphometry can also be used for diagnostic discrimination of sub- 
jects in non-demented or demented groups. Many discrimination techniques such as Fisher's linear discrimi- 
nant functions, support vector machines, and logistic discrimination can be applied to the metric distances, 
together with other qualitative variables. In this study we applied logistic discrimination based on metric dis- 
tances, as logistic regression not only provides a means for classification, but also yields a probability estimate 
for having DAT. Furthermore, one can optimize the threshold probability for a particular cost function for the 
entire training data set, or by a cross-validation technique. The correct classification rate of the hippocampi 
was about 70% in our logistic regression analysis. In [57] PCA of the initial momentum of the same data set 
led to correct classification of 12 out of 18 (i.e., 67% of the) demented subjects and 22 out of 26 (i.e., 85% 
of the) control subjects. Metric distances can be used to distinguish AD from normal aging quantitatively; 
however, to be able to use it for diagnostic purposes, the method should be improved to a greater extent. 

We perform a principal component analysis on metric distances and hippocampus, brain, and intracranial 
volumes. Considering the variable loadings, we conclude that volumes are mostly measures of size and partly 
related to shape, while the metric distance is mostly a measure of shape and partly related to size. 

We also compare the cross-sectional, longitudinal, and discrimination results of LDDMM distances with 
those of volumes. We observe that cross-sectional and longitudinal analysis give similar results, although 
metric distances increase and volumes decrease by time. The metric distance, being an extremely condensed 
summary measure give very similar results as the hippocampal volume. That is, neither volume nor metric dis- 
tance discriminated left baseline (LB), right baseline (RB), or left followup (LF) between CDRO and CDRO. 5; 
volume reduction and metric distances differences arc both significant for CDRO. 5 subjects, but neither of 
them are significant for CDRO subjects; and ANOVA suggested a significant diagnosis group-by-timepoint 
interaction for both measures. On the other hand, we obtain better classification results with using volumes 



compared to metric distances. When volume and LDDMM distances are used together, the classification 
results improve compared to results based on volume or distance only. Furthermore, the differential volume 
and distance changes are measured by annual percentage rate of change (APC) for the two year period in 
the study. Similar to the results of [71], we found that APC in volumes may be a good indicator for early 
stage of DAT. However, APC in LDDMM distances do not provide a good performance in classification of 
CDRO versus CCDR0.5 hippocampi. Comparing the discrimination results, we found that the classifier based 
on volume, distance, and APC in volume has the best performance. Hence these measures may constitute a 
reliable biomarker when used together. 

In conclusion, we have presented detailed statistical analysis of metric distances computed with LDDMM 
and show that this is potentially a powerful tool in detecting morphometric changes between diagnosis groups 
or changes in morphometry over time. Furthermore, we avoid the single subject analysis, which might be 
of greater interest clinically. Metric distances depend on the choice of template anatomy used. However, in 
this article we do not address the issue of template selection for optimal differentiation between hippocampus 
morphometry. 
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Tables and Figures 





I- 


Summary Information of Subjects 






Gender (M/F) 


Age (years) 


Scan interval 


Education (years) 




Gender (M/F) 


(mean ± SD) 


(years) ([Min-Max]) 


(mean ± SD) 


CDRO 


12/14 


75.2±7.0 


2.2 [1.4-4.1] 


14.8± 2.7 


CDRO.5 


11/7 


75.7±4.4 


2.0 [1.0-2.6] 


13. 7± 2.8 


overall 


23/21 


75.4±6.1 


2.1 [1.0-4.1] 


14.3± 2.8 


PL 


NA 


0.4224 


NA 


0.0001 


Pw 


NA 


0.8202 


NA 


0.2101 



II- Summary Statistics for Metric Distances at Baseline and Follow-up 





Mean ± SD 


Min 


Qi 


Median 


Q 3 


Max 


Lcft-4 (LDB) 


3.40 ± 0.68 


1.97 


3.00 


3.30 


3.65 


5.08 


Left-^ (LDF) 


3.57 ± 0.77 


2.26 


2.99 


3.48 


4.02 


6.03 


Right-4(RDB) 


3.65 ± 0.67 


1.73 


3.32 


3.53 


4.09 


4.98 


Right-4 (RDF) 


4.05 ± 0.67 


2.96 


3.72 


3.95 


4.34 


5.71 



III- Mean : 


b SD Values of Brain and Intracranial Volumes 




BV1 


BV3 


ICV1 


ICV3 


CDRO 


1006892± 104214.0 


1003319.4± 101129.0 


1407972 ± 156067.1 


1464494± 177496.0 


CDRO. 5 


1003850± 92293.4 


993380.8 ± 95425.0 


1408507± 134912.6 


1454966± 138931.2 


overall 


1005647± 98408.2 


999253.6 ± 97828.6 


1408191± 146140.3 


1460596± 161152.7 


PL 


0.2302 


0.0079 


0.0503 


0.1070 


Pw 


0.5192 


0.3277 


0.7929 


0.8299 



IV- Mean ± SD of Hippoc 


ampal Volumes 




LB 


LF 


RB 


RF 


CDRO 


2081.4 ± 354.8 


2081.4 ± 354.8 


2081.4 ± 354.8 


2081.4 ± 354.8 


CDRO. 5 


1717.6 ± 224.8 


1717.6 ± 224.8 


1717.6 ± 224.8 


1717.6 ± 224.8 


overall 


1932.6 ± 354.8 


1932.6 ± 354.8 


1932.6 ± 354.8 


1932.6 ± 354.8 


PL 


0.3528 


0.0268 


0.2001 


0.2359 


Pw 


0.0003 


0.0001 


0.0149 


0.0004 



V- Mean ± SD of Metric Distances 




LB 


LF 


RB 


RF 


CDRO 


3.34 ± 0.62 


3.41 ± 0.54 


3.63 ± 0.57 


3.83 ± 0.47 


CDR0.5 


3.48 ± 0.76 


3.82 ± 0.98 


3.68 ± 0.81 


4.37 ± 0.78 


overall 


3.40 ± 0.68 


3.57 ± 0.77 


3.65 ± 0.67 


4.05 ± 0.67 


PL 


0.0498 


0.4718 


0.2891 


0.1084 


Pw 


0.5994 


0.1590 


0.9145 


0.02058 



Table 1: Summary information of subjects (I); summary statistics for metric distances at baseline and follow- 
up, where SD stands for standard deviation, Q\ and Q3 stand for the first and third quartiles (II); means 
and SDs of brain and intracranial volumes by diagnosis group (III); means and SDs of hippocampal volumes 
by diagnosis group (IV); and means and SDs of metric distances by diagnosis group (V). pl- p- value based 
on Lilliefor's test of normality, pw- p-value based on Wilcoxon rank sum test. NA: not applicable; BV1 
(BV3): brain volume at baseline (followup); ICV1(ICV3): intracranial volume at baseline (followup); LB: left 
baseline; LF: left followup; RB: right baseline; and RF: right followup. 



Importance of Components 




PCI 


PC2 


PC3 


PC4 


Prop. Var 


.9877 


.0123 


- 0.0 


- 0.0 


Cum. Prop 


.9877 


~ 1.0 


~ 1.0 


1.0 


Variable Loadings 




PCI 


PC2 


PC3 


PC4 


HLV1 


- 0.0 


- 0.0 


1.00 


~ 0.0 


HLM1 


- 0.0 


- 0.0 


- 0.0 


1.00 


BV1 


.55 


-.83 


- 0.0 


- 0.0 


ICV1 


.83 


.55 


- 0.0 


- 0.0 



Table 2: The importance of principal components and variable loadings from the principal component analysis 
of metric distances and volumes of left hippocampi at baseline with eigenvalues based on the covariance matrix. 
PCi stands for principal component i for i = 1,2,3,4; Prop. Var: proportion of variance explained by the 
principal components; Cum. Prop: cumulative proportion of the variance explained by the particular principal 
component; HLV1: volume of left hippocampus at baseline; HLM1: metric distance of left hippocampus at 
baseline; BV1: brain volume at baseline; ICV1: intracranial volume at baseline. 



Baseline 


Followup 


Importance of Components 


Importance of Components 




PCI 


PC2 


PC3 


PC4 


PCI 


PC2 


PC3 


PC4 


Prop. Var 


.57 


.27 


.15 


.01 


.54 


.32 


.12 


.02 


Cum. Prop. 


.57 


.84 


.99 


1.0 


.54 


.86 


.98 


1.0 


Variable Loadings 


Variable Loadings 




PCI 


PC2 


PC3 


PC4 


PCI 


PC2 


PC3 


PC4 


HLV 


.37 


.59 


.71 


- 0.0 


.41 


.55 


.73 


- 0.0 


HLM 


.22 


-.80 


.55 


- 0.0 


- 0.0 


-.80 


.60 


- 0.0 


BV 


.64 


- 0.0 


-.27 


-.72 


.65 


-.16 


-.18 


-.72 


ICV 


.63 


- 0.0 


-.33 


.70 


.64 


-.17 


-.29 


.69 



Table 3: The importance of principal components and variable loadings from the principal component analysis 
of metric distances and volumes of left hippocampi at baseline and followup with eigenvalues based on the 
correlation matrix. HLV: volume of left hippocampus; HLM: metric distance of left hippocampus; BV: brain 
volume; ICV: intracranial volume. The other abbreviations are as in Table [2] 



Baseline 


Followup 


Importance of Components 


Importance of Components 




PCI 


PC2 


PC3 


PC4 


PCI 


PC2 


PC3 


PC4 


Prop. Var 


.54 


.33 


.12 


.01 


.57 


.35 


.07 


.02 


Cum. Prop. 


.54 


.87 


.99 


1.0 


.57 


.92 


.98 


1.0 


Variable Loadings 


Variable Loadings 




PCI 


PC2 


PC3 


PC4 


PCI 


PC2 


PC3 


PC4 


HRV 


.35 


.61 


.71 


- 0.0 


.50 


.46 


.72 


.13 


HRM 


- 0.0 


-.78 


.62 


- 0.0 


-.30 


-.70 


.64 


- 0.0 


BV 


.66 


- 0.0 


-.20 


-.71 


.58 


-.37 


- 0.0 


-.72 


ICV 


.66 


-.13 


-.26 


.70 


.57 


-.40 


-.25 


.67 



Table 4: The importance of principal components and variable loadings from the principal component analysis 
of metric distances and volumes of right hippocampi at baseline and followup with eigenvalues based on the 
correlation matrix. HRV: volume of right hippocampus; HRM: metric distance of right hippocampus; BV: 
brain volume; ICV: intracranial volume. The other abbreviations are as in Tabled 



Compound Symmetry 



o 2 






a 2 




0*1 







Autoregressive 



a 

OP 

op 2 



cr- 



ap er 
erp 3 erp 2 (T/9 <T 2 



Unstructured 



C21 



0"31 C32 
CT41 CT42 



CT43 0"4 



Autoregressive Hetero- 
geneous Variances 



op 0. 

„2 



erp (7/3 



ap 2 op o\ 



Table 5: The Var-Cov structures for the repeated measures ANOVA analysis on the metric distances; o 2 is 
the common variance term, of is the variance for repeated factor i, o^ is the covariance between repeated 
factors i and j, and p is the correlation coefficient for first order in an autoregressive model. 



Model 


Var-Cov 


df 


AIC 


BIC 


Log Likelihood 


Test 


L. Ratio 


p- value 


1 


CS 


10 


362.8 


394.1 


-171.4 








2 


UN 


18 


352.9 


409.1 


-158.4 


1 vs 2 


25.9 


0.0011 


3 


ARH 


13 


350.9 


391.5 


-162.5 


1 vs 3 


17.92 


< 0.0001 


4 


AR 


10 


347.1 


378.4 


-163.6 









Table 6: Model selection criteria results for models with compound symmetry (CS), unstructured (US), 
autoregressive (AR), and autoregressive heterogeneous (ARH) Var-Cov structures, df = error degree of 
freedom, AIC = Akaike information criteria, BIC = Bayesian information criteria, L. Ratio = likelihood ratio. 



Independent Group Comparisons of LDDMM Distances 




p- values for i-test 


p- values for Wilcoxon test 


Groups 


2-sidcd 




I s * > 2 nc ^ 


2-sided 


-yst ^ 2 nc ' 


^ st ^ c^rid 


LB-CDR0.5,LB-CDR0 


.5362 


.7319 


.2681 


.6078 


.7044 


.3039 


LF-CDR0.5,LF-CDR0 


.1179 


.9410 


.0590 


.1625 


.9223 


.0813 


RB-CDR0.5,RB-CDR0 


.8176 


.5912 


.4088 


.9239 


.5475 


.462 


RF-CDR0.5,RF-CDR0 


.0148* 


.9926 


.0074* 


.0212* 


.99 


.0106* 


Dependent Group Comparisons of LDDMM Distances 




p- values for paired t-test 


p- values for paired Wilcoxon test 


Groups 


2-sidcd 




I s * > 2 nd 


2-sided 






LB-CDR0.5,LF-CDR0.5 


.0259* 


.0129* 


.9871 


.0311* 


.0155* 


.9861 


RB-CDR0.5,RF-CDR0.5 


.0002* 


.0001* 


.9999 


.0005* 


.0002* 


.9998 


LB-CDR0,LF-CDR0 


.5958 


.2979 


.7021 


.7127 


.3563 


.6531 


RB-CDR0,RF-CDR0 


.1241 


.0621 


.9379 


.1244 


.0622 


.9409 



Table 7: The p- values based on independent sample t-test (top) and Wilcoxon rank sum test (middle) for both 
left and right metric distances and p-values based on paired t-tests for both left and right metric distances 
(bottom). Notice that we use t-tcsts when the assumptions (such as normality or homogeneity of variances 
hold), otherwise we use Wilcoxon test. Significant p-values at 0.05 level are marked with an asterisk (*). 



Correlation Coefficients of Baseline vs Follow-up Distances 


Groups 


r P 


PS 


TK 


LDB,LDF 


.6642 (<.0001*) 


.5686 (<.0001*) 


.3843 (.0001*) 


RDB,RDF 


.5076 (.0002*) 


.3835 (.0053*) 


.2754 (.0042*) 


LB-CDR0.5,LF-CDR0.5 


.7988 (<.0001*) 


.8147 (<.0001*) 


.6164 (.0002*) 


RB-CDR0,RF-CDR0.5 


.6995 (.0006*) 


.7028 (.0008*) 


.5556 (.0004*) 


LB-CDR0,LF-CDR0 


.4929 (.0053*) 


.3455 (.0420*) 


.2102 (.0661) 


RB-CDR0,RF-CDR0 


.2812 (.0820) 


.1888 (.1767) 


.1418 (.1549) 



Table 8: The correlation coefficients and the associated p- values for the one-sided (correlation greater than 
zero) alternatives, rp = Pearson's correlation coefficient, /?5 = Spearman's rank correlation coefficient, and 
tjc = Kendall's rank correlation coefficient. Significant p- values at 0.05 level are marked with an asterisk (*). 



Correlation Coefficients of Distances of Left vs Right Hippocampi 


Groups 


r P 


PS 




LDB,RDB 


.4017 (.0034*) 


.27 (.0382*) 


.1749 (.0471*) 


LDF,RDF 


.3441 (.0111*) 


.2009 (.0952) 


.1241 (.1175) 


LB-CDR0.5,RB-CDR0 


.3312 (.0492*) 


.3813 (.0277*) 


.2191 (.0582) 


LF-CDR0.5,RF-CDR0.5 


.1033 (.3078) 


.0400 (.4225) 


.0340 (.4039) 


LB-CDR0,RB-CDR0 


.3312 (.0492*) 


.3813 (.0277*) 


.2191 (.0582) 


LF-CDR0,RF-CDR0 


.1033 (.3078) 


.0400 (.4225) 


.0340 (.4039) 



Table 9: The correlation coefficients and the associated p- values for the one-sided (correlation greater than 
zero) alternatives, rp, pg, and tk stand for Pearson's, Spearman's, and Kendall's correlation coefficients, 
respectively. Significant p-values at 0.05 level are marked with an asterisk (*). 





p- values for cdf comparisons of Distances 


Groups 


Pks (2-s) 


Pks (1) 


Pks (g) 


PC 


PCvM 


LB-CDR0.5,LB-CDR0 


.6932 


.364 


.5706 


.4325 


.5112 


RB-CDR0,RB-CDR0 


.8997 


.5204 


.5875 


.6684 


.7098 


LF-CDR0.5,LF-CDR0 


.1208 


.0604 


.5706 


.0495* 


.0665 


RF-CDR0.5,RF-CDR0 


.0517* 


.0259* 


.9365 


.0095* 


.0235* 



Table 10: The p- values for the K-S, Cramer's, and Cramer- von Mises tests, pks (2-s), Pks (1), an d Pks (g) 
stand for the p- values based on K-S test for the two sided alternative, first cdf less than the second, and first 
cdf greater than the second alternatives, respectively, pc, and pcvM stands for the p- values for Cramer's test 
and Cramer- von Mises test, respectively. Significant p- values at 0.05 level are marked with an asterisk (*). 





A 


Truth 






B 


Truth 




Predict 




CDR0 


CDR0.5 


Total 


Predict 




CDR0 


CDR0.5 


Total 


CDR0 


95 


52 


147 


CDR0 


18 


8 


26 


CDR0.5 


9 


20 


29 


CDR0.5 


8 


10 


18 


Total 


104 


72 


176 


Total 


26 


18 


44 




C 


Truth 






D 


Truth 




Predict 




CDR0 


CDR0.5 


Total 


Predict 




CDR0 


CDR0.5 


Total 


CDR0 


22 


8 


30 


CDR0 


22 


10 


32 


CDR0.5 


4 


10 


14 


CDR0.5 


4 


8 


12 


Total 


26 


18 


44 


Total 


26 


18 


44 



Table 11: The classification matrices using metric distances in logistic regression with threshold p = 0.5: A 
= classification matrix of all hippocampi using logistic model Mu(D) using hippocampal LDDMM metric 
distances in Equation (9); B = classification matrix of subjects using logistic model Mjj(D) in Equation (9) 
with one hippocampus MRI classified as CDR0.5 being sufficient to label CDR0.5; C = classification matrix of 
subjects using logistic model Miu(D) in Equation (10) that only uses follow-up hippocampus MRIs and one 
hippocampus sufficient to label CDR0.5; D = classification matrix of subjects using logistic model Mjy(D) 
in Equation (11) that only uses follow-up right hippocampus MRIs. 





A 


B 


C* 


D 


PCCR 


65% 


64% 


73% 


68% 


P 

± sens 


28% 


56% 


56% 


44% 


P 

1 spec 


91% 


69% 


85% 


85% 



Table 12: The correct classification rates (Pccr), sensitivity (P sens ), and specificity (P spec ) percentages 
with p = 0.50 for the classification procedures A-D in Table QT] The model with the best classification 
performance is marked with an asterisk (*). 





Po = 1/2 


Po = 18/44 




Mj{D) 


M U {D) 


M ni {Df 


M IV (D) 


Mj(D) 


M H (D) 


M IH (D) 


M IV (D)* 


PCCR 


66% 


64% 


73% 


68% 


57% 


47% 


66% 


68% 


P 

± sens 


56% 


56% 


56% 


44% 


83% 


67% 


67% 


61% 


P 

1 spec 


73% 


69% 


85% 


85% 


38% 


35% 


65% 


73% 


Using optimum p a based on cost function C\(p Q , Wi, W2) with 




W\ = W2 = 1 


W\ = 1, W2 = 3 




Mi(D) 


M n (D) 


Mi 11(D) 


M IV (D)* 


Mj(D) 


M 11(D) 


Mm(D) 


M IV (D)* 


Popt 


.51 


.50 


.45 


.38 


.51 


.47 


.36,.37 


.38 


Pccr 


68% 


64% 


70% 


70% 


68% 


57% 


68% 


70% 


p 

1 sens 


56% 


56% 


67% 


72% 


56% 


61% 


78% 


72% 


P 

1 spec 


77% 


69% 


73% 


69% 


77% 


54% 


61% 


69% 


Using optimum p Q based on cost function C2(p a ) with 




771 = 772 = 0.5 


m = -3, m = 0.7 




Mj{D) 


M n (D) 


Mi 11(D) 


M IV (D)* 


Mj(D) 


M H (D) 


M IH (D)* 


M IV (D) 


Popt 


.81-. 82 


J6-.78 


.50-.52 


.38 


.37 


.37 


.33-. 34 


.22-.29 


Pccr 


75% 


73% 


73% 


70% 


59% 


61% 


66% 


55% 


p 

1 sens 


39% 


39% 


56% 


72% 


95% 


100% 


89% 


89% 


Pspec 


100% 


96% 


85% 


69% 


35% 


35% 


50% 


31% 



Table 13: The correct classification rates (Pccr), sensitivity (P se ns), and specificity (P spe c) percentages for 
the classification procedures based on models Mi(D) — Miv(D) using hippocampal LDDMM metrics and 
volumes with threshold probabilities p Q ~ 1/2 andp = 18/44 (top); with optimum threshold values p a = p opt 
based on the cost function C\(p , w\, W2) with w\ = W2 = 1 and wi = 1, W2 = 3 (middle); and with optimum 
threshold values p a = p opt based on the cost function C2(p , T)i, 772) with 771 = rj2 = 0.5 and 771 = .3, 772 = 0.7 
(bottom). The model with the best classification performance is marked with an asterisk (*). 



Volume comparisons 


p- values for t-test 


p- values for Wilcoxon test 


Groups 


2-sidcd 


<c 2 nf ^ 




2-sided 




1 st > 2 nd 


LB-CDR0.5,LB-CDR0 


.0002* 


.0001* 


.9999 


.0002* 


.0001* 


.9999 


LF-CDR0.5,LF-CDR0 


< .0001* 


< .0001* 


w 1.000 


< .0001* 


< .0001* 


w 1.000 


RB-CDR0.5,RB-CDR0 


.0025* 


.0012* 


.9998 


.0143* 


.0071* 


.9933 


RF-CDR0.5,RF-CDR0 


.0001* 


< .0001* 


« 1.000 


.0002* 


.0001* 


.9999 




p- values for paired i-test 


paired Wilcoxon test 


Groups 


2-sided 


I s * <c 2 nc ^ 




2-sided 


I s * <C 2 nc ^ 


I s * "> 2 nc ' 


LB-CDR0.5,LF-CDR0.5 


< .0001* 


w 1.000 


< .0001* 


.0001* 


w 1.000 


< .0001* 


RB-CDR0.5,RF-CDR0.5 


< .0001* 


w 1.000 


< .0001* 


< .0001* 


w 1.000 


< .0001* 


LB-CDR0,LF-CDR0 


< .0001* 


w 1.000 


< .0001* 


.0001* 


.9999 


< .0001* 


RB-CDR0,RF-CDR0 


< .0001* 


.9999 


.0001* 


.0002* 


.9999 


.0001* 


LB-CDR0.5,RB-CDR0.5 


< .0001* 


< .0001* 


w 1.000 


< .0001* 


< .0001* 


w 1.000 


LF-CDR0.5,RF-CDR0.5 


< .0001* 


< .0001* 


« 1.000 


.0001* 


.0001* 


w 1.000 


LB-CDR0,RB-CDR0 


< .0001* 


< .0001* 


« 1.000 


< .0001* 


< .0001* 


« 1.000 


LF-CDR0,RF-CDR0 


< .0001* 


< .0001* 


w 1.000 


< .0001* 


< .0001* 


w 1.000 



Table 14: The p-values based on independent sample t-test (top) and Wilcoxon rank sum test (middle) 
for both left and right hippocampus volumes and p-values based on paired t-tests for both left and right 
hippocampus volumes (bottom). Significant p- values at 0.05 level are marked with an asterisk (*). 





Po = 1/2 


Po = 18/44 




Mj(V) 


Mn(V) 


M ni (Vy 


Miv(V) 


Mj{V) 


M n (V) 


M in (V) 


Miv(V)* 


PCCR 


68% 


70% 


73% 


80% 


70% 


68% 


64% 


78% 


P 

1 sens 


83% 


89% 


83% 


72% 


89% 


89% 


100% 


78% 


P 

1 spec 


58% 


58% 


65% 


85% 


58% 


54% 


38% 


77% 


Using optimum p a based on cost function Ci(p 0l W\, W2) with 




Wl = 11)2 = 1 


Wl = 1, W2 = 3 




Mi{V) 


M n (V) 


M in (V) 


M IV (vy 


M T (V) 


Mn(V) 


M In {V) 


M IV (V)* 


Popt 


.64-. 66 


.62-.63 


.55-. 58 


.35-.42 


.61-. 62 


.58-.60 


.55-. 58 


.35-.36 


Pccr. 


82% 


75% 


77% 


80% 


80% 


73% 


77% 


80% 


p 

1 sens 


78% 


78% 


83% 


89% 


83% 


83% 


83% 


89% 


P 

J spec 


85% 


73% 


73% 


73% 


77% 


65% 


73% 


73% 


Using optimum p based on cost function C2(p Q , r)i, 772) with 




vi = m = 0.5 


Vi = -3, V2 = 0.7 




Mj{V) 


M U (V) 


M UI {V) 


M IV (vy 


Mj(V) 


M U (V) 


Mm(V) 


M IV (vy 


Popt 


.64-. 66 


.70 


.69 


.35-.36 


.31 


.31 


.25-. 32 


.26 


Pccr. 


82% 


80% 


82% 


80% 


70% 


66% 


70% 


77% 


p 

J sens 


78% 


61% 


61% 


89% 


100% 


100% 


94% 


94% 


Pspec 


85% 


92% 


96% 


73% 


50% 


42% 


54% 


65% 



Table 15: The correct classification rates (Pccr), sensitivity (P S ens), and specificity (P sp ec) percentages 
for the classification procedures based on models Mi(V) — Miv(V) using hippocampal LDDMM metrics and 
volumes with threshold probabilities p D = 1/2 and p a = 18/44 (top); with optimum threshold values p D = p op t 
based on the cost function C\(p , w\, W2) with Wi = W2 = 1 and w% = 1, u>2 = 3 (middle); and with optimum 
threshold values p a = p opt based on the cost function C2(p , rji, 772) with rji = 772 = 0.5 and 771 = .3, 772 = 0.7 
(bottom). The model with the best classification performance is marked with an asterisk (*). 





Po = 1/2 


Po = 18/44 




M T (V,D) 


M n (V,D) 


M in (V,D) 


M IV (V,D)* 


Mi(V,D) 


M n (V,D) 


M in (V,D) 


M IV (V,D)* 


Pccr 


75% 


66% 


75% 


82% 


66% 


66% 


73% 


77% 


p 

1 sens 


89% 


89% 


83% 


78% 


89% 


89% 


89% 


83% 


Pspec 


65% 


50% 


69% 


85% 


50% 


50% 


62% 


73% 


Using optimum p a based on cost function C\{jp , w\, W2) with 




Wl ~ 7X>2 = 1 


Wl = 1, 102 = 3 




M T {V,Dy 


M U (V,D) 


M m {V,D) 


Mjv{V,D) 


Mj{V,D)* 


M H (V,D) 


Mj H (V,D) 


M IV (V,D) 


Popt 


.64-.65 


.66 


T8-.58 


.48 


.64-. 65 


.66 


.48-. 54 


.28-. 33 


Pccr, 


84% 


84% 


75% 


84% 


84% 


84% 


75% 


80% 


Psens 


89% 


83% 


83% 


83% 


89% 


83% 


83% 


89% 


Pspec 


81% 


85% 


69% 


85% 


81% 


85% 


69% 


73% 


Using optimum p a based on cost function C2(p OJ 771, 772) with 




771 = 772 = 0.5 


t?i = -3, 772 = 0.7 




M T (V,Dy 


M U (V,D) 


M UI {V,D) 


M IV {V,D) 


Mi(V,D)* 


M H (V,D) 


M HI (V,D) 


M IV (V,D) 


Popt 


.64-.65 


.66 


.68-J2 


.48 


.64-. 65 


.66 


.23 


.20-. 22 


Pccr 


84% 


84% 


80% 


84% 


84% 


84% 


70% 


75% 


Psens 


89% 


83% 


61% 


83% 


89% 


83% 


100% 


94% 


P 

1 spec 


81% 


85% 


92% 


85% 


81% 


85% 


50% 


61% 



Table 16: The correct classification rates (Pccr), sensitivity (P S ens), and specificity (P sp ec) percentages for 
the classification procedures based on models Mi(V,D) — Miv(V, D) using hippocampal LDDMM metrics 
and volumes with threshold probabilities p = 1/2 and p Q = 18/44 (top); with optimum threshold values 
Po = Popt based on the cost function C\ (p , wi, W2) with w\ = tjj 2 = 1 and Wi = 1, tjj 2 = 3 (middle); and 
with optimum threshold values p a = p opt based on the cost function C2(p , r \i, r \2) with r\i = T]2 = 0.5 and 
?7i = .3, 772 = 0.7 (bottom). The model with the best classification performance is marked with an asterisk 
(*)• 





Po = 1/2 


Po = 18/44 




M!{v APc y 


M U (V APC ) 


Mm (V APC ) 


Mi (V APC ) 


Mn (V APC ) 


Mm (V APC ) 


PCCR 


75% 


80% 


75% 


64% 


73% 


73% 


P 

± sens 


72% 


61% 


61% 


83% 


61% 


61% 


P 

1 spec 


77% 


92% 


85% 


50% 


81% 


81% 


Using optimum p a based on cost function C\(p Q , w±, W2) with 




W\ = W2 = 1 


Wi = 1, W2 = 3 




Mj{v APc y 


M n (V APC ) 


M in (V APC ) 


M!(V APC Y 


M n (V APC ) 


Mm (V APC ) 


Popt 


.55-.55 


.38-. 39 


.27-. 28 


.54-. 55 


.34 


.25 


Pccr 


80% 


75% 


75% 


80% 


73% 


73% 


p 

1 sens 


72% 


72% 


78% 


72% 


83% 


83% 


P 

1 spec 


85% 


81% 


73% 


85% 


65% 


65% 


Using optimum p a based on cost function C2(p a , Tji, 772) with 




vi = m = 0.5 


vi = -3, m = 0.7 




M!(v APC y 


M n (V APC ) 


Mm (V APC ) 


Mi (V apc ) 


Mn (V APC ) 


Mm (V APC ) 


Popt 


.54-.55 


.82-. 85 


.61-. 69 


.39 


.34 


.18-.21 


Pccr 


80% 


82% 


82% 


64% 


73% 


70% 


p 

± sens 


72% 


56% 


56% 


89% 


83% 


100% 


P 

1 spec 


85% 


100% 


100% 


46% 


65% 


50% 



Table 17: The correct classification rates (Pccr), sensitivity (P se ns), and specificity (P sp ec) percentages for 
the classification procedures based on models Mj (V APC ) — Mm (V APC ) using APC in hippocampal volumes 
with threshold probabilities p Q — 1/2 and p Q = 18/44 (top); with optimum threshold values p = p opt based 
on the cost function Ci(p , w\, W2) with W\ = W2 = 1 and wi = 1, W2 = 3 (middle); and with optimum 
threshold values p = p op t based on the cost function C2(p , V2) with rji = i}2 = 0.5 and 771 = .3, 772 = 0.7 
(bottom). The model with the best classification performance is marked with an asterisk (*). 





Po = 1/2 


Po = 18/44 




Ml (D APC ) 


Mn (D APC ) 


Mm {D APC ) 


Mi (D APC ) 


Mn (D APC y 


Mm {D APC ) 


Pccr 


59% 


61% 


61% 


57% 


77% 


64% 


p 

1 sens 


27% 


28% 


28% 


72% 


72% 


56% 


P 

1 spec 


81% 


85% 


85% 


46% 


42% 


69% 


Using optimum p a based on cost function C\(p , w\, W2) with 




Wi = W2 = 1 


w\ = 1, w 2 = 3 




Mi (D APc y 


Mn (D APc y 


Mm (D APC ) 


Mi (D APC ) 


Mn (D APC ) 


Mm (D APC ) 


Popt 


.45 


T5-.46 


.41 


.42 


T5-.46 


.36 


Pccr 


66% 


66% 


66% 


61% 


66% 


64% 


p 

1 sens 


61% 


61% 


56% 


67% 


61% 


78% 


Pspec 


69% 


69% 


73% 


58% 


69% 


54% 


Using optimum p a based on cost function C2{p , ?7i, V2) with 




Vi = V2 = 0.5 


Vi = -3, 772 = 0.7 




Mi (D apc ) 


Mn (D APC ) 


Mm (D APC y 


Mi (D apc ) 


Mn (D APC ) 


Mm {D APC ) 


Popt 


.45 


T5-.46 


.32 


.37 


.35-.36 


.29 


Pccr 


66% 


66% 


64% 


52% 


55% 


59% 


p 

1 sens 


61% 


61% 


89% 


100% 


100% 


100% 


P 

1 spec 


69% 


69% 


46% 


19% 


23% 


31% 



Table 18: The correct classification rates (Pccr), sensitivity (P se ns), and specificity (P spe c) percentages for 
the classification procedures based on models Mi (D APC ^ — Mm (D APC ) using APC in hippocampal LD- 
DMM metric distances with threshold probabilities p = 1/2 and p = 18/44 (top); with optimum threshold 
values po = Popt based on the cost function Ci(p a , wi, W2) with w\ = W2 = 1 and w\ = 1, W2 = 3 (middle); 
and with optimum threshold values p a = p op t based on the cost function C2(p , Vii V2) with 7/1 = r\2 = 0.5 and 
?7i = .3, r\2 = 0.7 (bottom). The model with the best classification performance is marked with an asterisk 





Po = 1/2 


Po = 18/44 




M!(V APC ,D) 


Mu (V APC ,D) 


Mm {V APC ,D) 


Mj(V APC ,D) 


Mu {V APC ,D) 


Mm {V APC ,Dy 


PCCR 


73% 


77% 


77% 


68% 


75% 


80% 


Psens 


83% 


67% 


56% 


83% 


72% 


87% 


P 

J spec 


65% 


85% 


92% 


58% 


77% 


85% 


Using optimum p based on cost function Ci(p Q , wi, u>2)with 




W\ = U>2 = 1 


W\ = 1, 102 = 3 




M!(V APC ,D) 


M U (V APC ,D) 


Mm (V APC ,DY 




M n (V APC ,D) 


Mm (V APC ,D) 


Popt 


.61 


.53-J3 


.35-.40 


.56 


.35-. 36 


.31-. 32 


Pccr 


84% 


82% 


80% 


80% 


70% 


75% 


p 

1 sens 


72% 


67% 


78% 


78% 


78% 


83% 


P 

J spec 


92% 


92% 


81% 


81% 


65% 


69% 


Using optimum p Q based on cost function C2(p , 171, 772) with 




m = V2 = -5 


?n = -3, r?2 = -7 




M!(V APC ,D) 


M n {V APC ,D) 


Mm (V APC ,D) 


M!(V APC ,DY 


M n (V APC ,D) 


Mm (V APC ,D) 


Popt 


.61 


.53-J3 


.49 


.56 


.23-. 25 


.31-. 32 


Pccr 


84% 


82% 


82% 


80% 


55% 


75% 


Psens 


72% 


67% 


67% 


78% 


100% 


83% 


P 

1 spec 


92% 


92% 


92% 


81% 


23% 


69% 



Table 19: The correct classification rates (Pccr), sensitivity (P S ens), and specificity (P sp ec) percentages for the classification procedures based on models 
Mi (V APC ,D) - Mm (V APC , D) using metric distance, and APC in hippocampal volumes with threshold probabilities p Q = 1/2 and p = 18/44 (top); 
with optimum threshold values p = p op t based on the cost function C\(p , Wi, W2) with w\ = W2 = 1 and w\ — 1, W2 = 3 (middle); and with optimum 
threshold values p a = p opt based on the cost function 62(^0, r)i, 772) with rji = T]2 = 0.5 and 771 = .3, 772 = 0.7 (bottom). The model with the best classification 
performance is marked with an asterisk (*). 





Po = 1/2 


Po = 18/44 




Mi(V,V APC ,D) 


M U {V,V APC ,Dy 


M in (V,V APC ,Dy 


M!(V,V APC ,D) 


M n {V,V APC ,D) 


Mm (V,V APC ,Dy 


PCCR 


86% 


89% 


89% 


80% 


80% 


91% 


Psens 


94% 


89% 


89% 


94% 


94% 


94% 


P 

1 spec 


81% 


88% 


88% 


69% 


69% 


88% 


Using optimum p based on cost function C±(p , w\, u;2)with 




W\ = U>2 = 1 


Wl = 1, W2 = 3 




Mi(V,V APC ,D) 


Mn (V,V APC ,D) 


Mm (V,V APC ,DY 




Mu (V,V APC ,D) 


Mm (V,V APC ,DY 


Popt 


.56-. 57 


.48-. 51 


.39-.42 


.56-. 57 


.48-. 51 


.39-.42 


Pccr. 


89% 


89% 


91% 


89% 


89% 


91% 


p 

1 sens 


94% 


89% 


94% 


94% 


89% 


94% 


P 

1 spec 


85% 


88% 


88% 


85% 


88% 


88% 


Using optimum p based on cost function C2(p , 171, 772) with 




m = V2 = -5 


»?1 = -3, T]2 = -7 




m z (f,f apc ,-D) 


M U (V,V APC ,D) 


M ni (V,V APC ,DY 


Mj(y,F APC ,-D) 


Mu(V,V APC ,D) 


Mm (V,V APC ,DY 


Popt 


.56-. 57 


.48-. 51 


.39-.42 


.56-. 57 


.48-. 51 


.39-.42 


Pccr 


89% 


89% 


91% 


89% 


89% 


91% 


Psens 


94% 


89% 


94% 


94% 


89% 


94% 


P 

J spec 


85% 


88% 


88% 


85% 


88% 


88% 



Table 20: The correct classification rates (Pccr), sensitivity (P S ens), and specificity (P sp ec) percentages for the classification procedures based on models 
Mi (V, V APC , D) — Mm [V, V APC , D) using metric distance, and APC in hippocampal volumes with threshold probabilities p = 1/2 and p = 18/44 (top); 
with optimum threshold values p Q = p op t based on the cost function C\(p , Wi, W2) with w\ = W2 = 1 and w\ = 1, W2 = 3 (middle); and with optimum 
threshold values p a — p opt based on the cost function C2(p , Tji, r\2) with r]\ = r/2 = 0.5 and r/i ~ .3, 772 = 0.7 (bottom). The model with the best classification 
performance is marked with an asterisk (* ) . 
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Figure 1: Change in metric distance during diffcomorphic flow from template (Jo) to target (Ji = </>i/o = 

I O (f)^ 1 ). 
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Figure 2: Generation of metric distances d\ b '^ for subjects k = 1, ... ,44 at baseline (6) and at follow-up 
(/)• 
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Figure 3: Pairs plots of the continuous variables for the hippocampi at baseline and follow-up. HLV: volume 
of left hippocampus; HRV: volume of right hippocampus; HLM: metric distance for left hippocampus; HRM: 
metric distance for right hippocampus; BV: brain volume; ICV: intracranial volume. The numbers 1 and 3 
stand for year 1 (i.e., baseline) and year 3 (i.e., follow-up), respectively. 
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Figure 4: Scatter plots of the metric distances for the left and right distances at baseline and follow-up. The 
metric distances are jittered for better visualization and the crosses represent the mean distance values. 
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Figure 5: The left interaction plot is for diagnosis levels over the timepoint levels (the effect of sides is 
ignored). The slope (i.e., the rate of change in morphometry for CDRO. 5 subjects) is significantly larger 
than that of CDRO subjects. The right interaction plot for side levels over the timepoint levels (the effect of 
diagnosis is ignored). The slopes seem to not significantly differ between Left and Right hippocampi. 
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Figure 6: Interaction plots for diagnosis levels over the timcpoint levels for left and right metric distances. 
Although the slopes arc different for both left and right hippocampi, the difference in the right seems to be 
much larger. 
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Figure 7: Empirical cdfs of the metric distances for the CDRO. 5 vs CDRO Left and Right hippocampus at 
follow-up. 




Figure 8: Fitted probability for having mild dementia (CDR0.5) and observed proportion in metric distances 
with model (9) (top-left); model (10) (top-right); and model (11) (bottom), 



